0 



(19) 



J 




(12) 



EuropSisches Patentamt 
European Patent Office 
Office europeen des brevets (11) EP0 721 016 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

10.07.1996 Bulletin 1996/28 

(21) Application number: 95307501.7 

(22) Date of filing: 20.10.1995 



(51) Int CI. 6 : C12Q 1/68, C07H 21/00 



(84) Designated Contracting States: 
DE FR GB IT NL 

(30) Priority: 21.10.1994 US 327522 

24.10.1994 US 327687 

18.10.1995 US 533582 

(71) Applicant: AFFYMAX TECHNOLOGIES N-V. 
Wlllemstad, Curacao (AN) 

(72) Inventors: 

• Lockhart, David J. 

Santa Clara, California 95054 (US) 

• Chee, Mark S. 

Palo Alto, California, 94306 (US) 



• Vetter, Dirk 

D-791 10 Freiburg (DE) 

• Diggelmann, Martin 
CH-4435 Nlederdorf (CH) 

(74) Representative: Bizley, Richard Edward et al 
Hepworth, Lawrence, Bryer & Bizley 
Merlin House 
Falconry Court 
Baker's Lane 

Epping Essex CM16 5DQ (GB) 

Remarks: 

The applicant has subsequently filed a sequence 
listing and declared, that it includes no new matter. 



CM 
< 
CO 

o 

CM 

O 

Q. 
LU 



(54) Nucleic acid library arrays, methods for synthesizing them and methods for sequencing and 
sample screening using them 



(57) Methods for discriminating between fully com- 
plementary hybrids and those that cfiffer by one or more 
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stranded oligonucleotides on a solid support. In one 
embodiment, the present invention provides methods of 
using nuclease treatment to improve the quality of 
hybridization signals on high density oligonucleotide 
arrays. In another embodiment, the present invention 
provides methods of using ligation reactions to improve 
the quality of hybridization signals on high density oligo- 
nucleotide arrays. In yet another embodiment the 
present invention provides libraries of unimolecular or 
intermolecular, double-stranded oligonucleotides on a 
solid support. These libraries are useful in pharmaceu- 
tical discovery for the screening of numerous biological 
samples for specific interactions between the double- 
stranded oligonucleotides, and peptides, proteins, drugs 
and RNA. In a related aspect, the present invention pro- 
vides libraries of conformational^ restricted probes on a 
solid support. The probes are restricted in their move- 
ment and flexibility using double-stranded oligonucle- 
otides as scaffolding. The probes are also useful in 
various screening procedures associated with drug dis- 
covery and diagnosis. The present invention further pro- 
vides methods for the preparation and screening of the 
above libraries. 
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Description 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of United states Serial No. 08/327,522, filed October 21 1 994 and United 
States Serial Number 08/327,687, filed October 24, 1994, each of which is incorporated by reference in .ts entirety ior 
all purposes. 7 

GOVERNMENT RIGHTS 

Research leading to the invention was funded in part by NIH Grant No. , and the government 

may have certain rights to the invention. 

BACKGROUND OF THE INVENTION 

The relationship between structure and function of macromolecules is of fundamental importance in the understand- 
ing of biological systems. Such relationships are important to understanding, for example, the functions of enzymes 
structural proteins, and signalling proteins, the ways in which cells communicate with one another, the mechanisms of 
cellular control and metabolic feedback, eta 

Genetic information is critical in the continuation of life processes. Life is substantially informationally based and 
genetic content controls the growth and reproduction of the organism and its complements. Proteins, which are critical 
features of all living systems, are encoded by the genetic materials of the cell. More particularly, the properties of 
enzymes, functional proteins and structural proteins are determined by the sequence of amino acids from which they 
are made. As such, it has become very important to determine the genetic sequences of nucleotides which encode the 
enzymes, structural proteins and other effectors of biological functions. In addition to the segments of nucleotides which 
encode polypeptides, there are many nucleotide sequences which are involved in the control and regulation of aene 
expression. " w 

The human genome project is an example of a project that is directed toward determining the complete sequence 
of the genome of the human organism. Although such a sequence would not necessarily correspond to the sequence 
of any specific individual, it will provide significant information as to the general organization and specific sequences 
contained wrihm genomic segments from particular individuals. It will also provide mapping information useful for further 
detailed studies. The need for highly rapid, accurate, and inexpensive sequencing technology is nowhere more apparent 
than in a demanding sequencing project such as this. To complete the sequencing of a human genome will require the 
determination of approximately 3 x 10 9 . or 3 billion, base pairs. 

The procedures typically used today for sequencing include the methods described in Sanger, et al Proc Natl 
Acad. Sci. USA 74:5463-5467 (1977). and Maxam. et al., Methods in Enzymology 65:499-559 (1980) The Sanger 
method utilizes enzymatic elongation with chain terminating dideoxy nucleotides. The Maxam and Gilbert method uses 
chemical reactions exhibiting specificity of reactartts to generate nucleotide specific cleavages. Both methods however 
require a practitioner to perform a large number of complex, manual manipulations. For example, such methods usually 
require the isolation of homogeneous DNA fragments, elaborate and tedious preparation of samples, preparation of a 
separating gel. application of samples to the gel. electrophoresing the samples on the gel. working up the finished oe) 
and analysis of the results of the procedure. 

Alternative techniques have been proposed for sequencing a nucleic acid. PCT patent Publication No 92/10588 
incorporated herein by reference for all purposes, describes one improved technique in which the sequence of a labeled" 
target nucleic acid is determined by hybridization to an array of nucleic acid probes on a substrate. Each probe is located 
at a positional^ distinguishable location on the substrate. When the labeled target is exposed to the substrate it binds 
at locations that contain complementary nucleotide sequences. Through knowledge of the sequence of the probes at 
the binding locations, one can determine the nucleotide sequence of the target nucleic acid. The technique is particularly 
efficient when yery large arrays of nucleic acid probes are utilized. Such arrays can be formed according to the techniques 
described in U.S. Patent No. 5,143.854 issued to Pirrung. et al. See also. U.S. application Serial No. 07/805 727 both 
of which are incorporated herein by reference for all purposes. 

When the nucleic acid probes are of a length shorter than the target one can employ a reconstruction technique 
to determine the sequence of the larger target based on affinity data from the shorter probes See U S Patent No 
5.202.231 issued to Drmanac. era/., and PCT patent Publication No. 89/10977 issued to Southern. One technique for 
overcoming this difficulty has been termed sequencing by hybridization or SBH. Assume, for example that a 12-mer 
target DNA. i.e.. 5'-AGCCTAGCTGAA. is mixed with an array of all octanucleotide probes. If the target binds only to 
those probes having an exactly complementary nucleotide sequence, only five of the 65.536 octamer probes (/ e 3'- 
TCGGATCG. CGGATCGA. GGATCGAC. GATCGACT. and ATCGACTT) will hybridize to the target. Alignment of 'the 
overlapping sequences from the hybridizing probes reconstructs the complement of the original 12-mer target- 
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Although such techniques have been quite useful, it would be helpful to have additional methods wh,ch can 
effectively discriminate between fully complementary hybrids and those that differ by one or more base pairs 

other Sectors of biological functions, it is important to known how such spec.es .nteract. / .number ^chem^l 
processes involve the interaction of some species, e.g., a drug, a peptide or protein, or RNA, wrth '^^^5^ 
FoTirJle. protein/DNA binding interactions are involved with a number of transcription factors as well as wrth tumo 
suppTiTon associated with the p53 protein and the genes contributing to a number of cancer condrt.ons. As such it 
3 be SSges to have methods for preparing libraries of diverse double-stranded nudeic ^ se^enc« i and 
pS.es which can be used, for example, in peering studies for the determinati^ 

^t^of^htizing desired single stranded DNA sequences are we., known to those of ski., in the art. .n 
partiX^e^cTofTynthesizing ofigonucleotdes are found in. for examp.e. Oligonucleotide * 
AoSSxh (Bait ed IRL Press. Oxford (1984). incorporated herein by reference in its entirety for all purposes. Synthe- 
SSn^cuS douole-stranded DNA in solution has also been described. See. Durand. ef a/.. Nucleic Aads Res. 
Z^^^^Z ^omson. et a/.. NuCeic Acid Res. 21 =5600-5603 (1 993), the disdosures of both being incor- 

30 POra Sid h phtsShrs e of biCogica. powers has been evolving since the early "Merrifield" solid phase peptide 
svnthesfs d^Sb^in MerrHield. J. Arrlchem. Soc 85:2149-2154 (1963). incorporated herein by reference for ail 
puSS* S-phase synthesis techniques have been provided for the synthesis of several peptide sequences on. for 
exaSf a numErof "pins." See. e.g Geysen. ef a/.. J. Immun. Meth. 102259-274 (1987). incorporated herein by 
3S Olhersolid'phase techniques invoke .for example X^^T^T^- 

on different cellulose disks supported in a column. See. Frank and Donng. Tetrahedron^ 6C 131 -6040 ( 988 ) inwrpo 
rated herein by reference for all purposes. Still other solid-phase techniques are described in U.S. Patent No. 4.728 502 
issued to Ham!ll arel WO 90/00626 (Beattie. inventor). Unfortunately, each of these techniques produces only a relatively 
? ^rayXolymers. For eWr^le. the technique described in Geysen. ef a/, is limited to produang 96^^ 
40 Dolvmers on pins spaced in the dimensions of a standard mierotiter plate. 

Proved methods of forming large arrays of oligonucleotides, peptides and other polymer sequences in a short 
pericS^ US " PatentN °- 5.^854 (see a^soPCT Application 

No WO 90/15070) and Fodor. et a!.. PCT Publication No. WO 92/10092. all incorporated herein by ^ren^sdose 
methods of forming vast arrays of peptides, oligonucleotides and other polymer sequences using for example, hght- 
< 5 SrS synSe'is techniques See a/so. Fodor. ef a/.. Science, 251:767-777 (1991). incorporated herein by reference 
for all purposes. These procedures are now referred to as VLSIPS™ procedures. 

More^rfcuteriy. in the Fodor. ef a/.. PCT application, an e.egant method described for using a cornier-con- 
trolled system to direct a VLS.PS™ procedure. Using this approach, one heterogenous array of polymers ^erted, 
through simuKaneous coupling at a number of reaction sites, into a drfferent heterogenous ^rray. See^US. Application 
so Serial Nos 07/796 243 and 07/980.523. the disclosures of which are incorporated herein for all purposes. 

though such techniques have been quite useful, rt would be advantages to have additional methods for preparing 
libraries of diverse double-stranded nucleic acid sequences and probes which can be used, for example, in screening 
studies for the determination of binding affinity exhibited by binding proteins, drugs or RNA. 

£5 SUMMARY OF THE INVENTION 

In one embodiment, the present invention provides methods of using nuclease treatment to improve the quality of 
hybridization signals on high density oligonucleotide arrays. More particularly, in one such method ™ ^ <™W>™- 
Sec-tides is combined with a labelled target nucleic acid to form target-oligonucleotide hybrid complexes. Thereafter, the 
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^ C ° mpleXeS 8re treat6d With 8 nuC,ease in tum - a ™y °f target-oligonucleotide 
° rem ^ e non -P erfec,| y complementary target-oligonucleotide hybrid complexes. Following 
SSS ST E*. . ^^ ar ^ t: ?'9° nucleoMe W complexes which are perfectly complementary are more readily 

J* ^ ^ ,abe,led ter9ete ' * he oli 9° nucleotid e P«*« which hybridized with fte targets can be 
identified and. in turn, the sequence of the target nucleic acid can be more readily determined or verified 

Jr a T!' er embodiment 1,16 P resent invention P"»*« methods wherein ligation reactions are used to discriminate 

^£Z£2£? ?Z * M * and 1,1056 ** differ by one or m base ^ ,n one such ™™ « 

SS^ST " 9 f 0 « °1 a SUbStrate ° n 3 ' t0 5 direCtton) one * the methods bribed herein 

. n , ? d6S the a " ay iS Sh0rt6r ' eng,h ^ the nucleic acW 80 tna » when hybridized to the 

target nuc e.c acid, the target nucleic acid generally has a 3' overhang. In this embodiment, the target nucleic acid is not 

T ^ * 0,i 9° nuc,eotides has «"™"ed «Hh the target nuclei? add to 

I 0 ' fTr*?* With 3 P001 ° f labelled - li9atable probes - ^ li9ation reaction <* «» labelled, ligatable 
probes to the 5 end of the ohgonudeotide probes on the substrate will occur, in the presence of the ligase only when 

9 onudeotde hybrid has formed with correct base-pairing near the 5' end of the oligonucleotide pTbe aS 
where there .s a surtable 3' overhang of the target nucleic acid to serve as a template for hybridization and ligation After 
the ngat, 0n react.cn. the substrate is washed (muttiple times if necessary) with water at a temperature of JESrSZ 
S„? 0 Ve ^lT^ Und ter9et nUC ' eiC aa ' d and * e ,abe,,ed - """B^ P'°°es. Thereafter, a quantitative f luores- 
and labelled ol.gonucleot.de probes, i.e., the oligonudeotide probes which are perfectly complementary to the tarae 
or^iS ^ 

P a rh ^ nL < !^ er ^? 0 ^ me^t ■■ PfeSent irWenti ° n prWideS libraries of ""Secular, double-stranded oligonudeotkJes. 
nZJZ^r £ ^ * I 0 ^" 8 ^ 01 8 "** ""P' 30 ^ a " ° ptional s P acer for atiachin 9 the double-stranded digo- 
1 SU PP°^ a " d for P™ di "9 suff taient space between the double-stranded oligonucleotide and the solid 
support for subsequent boding studies and assays, an oligonudeotide attached to the spacer and further attached to 

SZSTrr bV mSanS * 9 f,eXible Hnker - SUCh that the two^gonucleoS £n£S2£ 

siS3^^s^b^ss^ ,, '^ particularly • * e membere ° f ,he iibraries ° f *• present in ^°"«" be rep. 

Y— L 1 — X 1 — L 2 _X 2 

in which Y is a solid support. L 1 is a bond or a spacer. L 2 is a flexible linking group, and X 1 and X 2 are a oair of comma- 

lEZSSZZS*- ^ 8SPeCt * thS inV6n,i0n> unimolecu" d£i£2. 

l ^ r S u Creenm9 8 SamP ' e for 8 Sp6CieS WhiCh binds 10 one or more mem bers of the library. 
a tt Jli emb0d ' ment ' the f[ e ! ent inventon P rovides a "brary of different confbrmationally-restrided probes 

attached to a solid support is provided. The individual members each have the formula: 

-X 11 -Z— X 12 

J *tom^Hn a ?l X \ 2 ^ ST^T^: °'i9onudeotides and Z is a probe having sufficient length such that X 1 1 and 
the IX ohgonudeotde portion of the member and thereby restrict the conformations available to 

to, - ^ t ^ 6 ,^Vent,0n • * he ,ibrary ° f different conformationally-restricted probes can be used 

for screen.ng a sample for a spedes which binds to one or more probes in the library. 

'" y6t I an0th !, r ambodiment " 106 P resent invention Prides libraries of irrtermolecular. doubly-anchored, double- 
stranded ol.gonudeot.des. each member of the library having the formula: 
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X 2 — L 2 - 




in which Y represents a solid support. X 1 and X 2 represent a pair of complementary or partially complementary oliao- 
nudeofdes, and 0 and L 2 each represent a bond or a spacer. Typically. U and L 2 are the same aS arc space* ?h% 
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sufficient length such that X 1 and X 2 can form a double-stranded oligonucleotide. The non-ccvalerrt binding which exists 
between X 1 and X 2 is represented by the dashed line. . . . 

According to yet another aspect of the present invention, methods and dev,ces for the electronic detection of 

016 'TSSPSSSS^ of the nature and advantages of the inventions herein may be realized by reference to the 
remaining portions of the specif ication and the attached drawings. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

FIG 1 illustrates discrimination of non-perfectly complementary targefcoligonucleotide hybrids using RNase .A. 
FO^illustratesdiscrimination of non-perfectly complementary target:oligonucleotide hybnds us.ng a ligaton reac- 

FIG 3 illustrates the light directed synthesis of an array of oligonucleotides on a substrate. 
FIG. 4. illustrates a hybridization procedure which can be used prior to nuclease treatment 

FIG 5 illustrates probe tiling strategy used to generate the probes. 

FIG 6 illustrates the results obtained from hybridization to the substrate without RNase treatment. 
FIG. 7 illustrates the results obtained from hybridization to the substrate with RNase treatment. 
FIG. 8 illustrates a method for improving the sequencing of the 5' end of a randomly fragmented target us.ng 2 

RG^Tto 9F il.ustrat.fthe preparation of a member of a library of surface-bound, unimoleoilar double-stranded 
DNA as well as binding studies with receptors having specificity for either the double stranded DNA portion, a probe 
which is held in a conformationally restricted form by DNA scaffolding, or a bulge or loop region of RN* _ 

FIG iVa to 10F illustrate the preparation of several different types of intermolecular. doubly-anchored, double- 
stranded ^ ^ ^ ^ relationship betw an intenogation position 

(I) and a corresponding nucleotide (n) in the reference sequence, and between a probe from the first probe sel and 
corresponding probes from second, third and fourth probe sets. 
30 FIG 12 illustrates the segment of complementarity in a probe from the first probeset. . _ 

FIG 1 3 illustrates the hcremental succession of probes in a basic tiling strategy. The figure shows four Presets 
each having three probes. Note that each probe differs from its predecessor in the same set by the acquisition of a 5 
nucleotide and the loss of a 3" nucleotide, as well as in the nucleotide occupying the interrogation position. 

re 11 ^illustrates the exemplary arrangement of lanes on a chip. The chip shows four probe sets, each having 
35 five probes and each having a total of five interrogation positions (11-15). one per probe. 
FIG 14B illustrates a tiling strategy for analyzing closing spaced mutations. 
fig 1 4C illustrates a tiling strategy for avoiding loss of signal due to probe self-annealing. 
FIG \?J££Z " hybridizationUm of chip having probes laid down in .anesJDark oatd.es indicate . hybr^ 
zati^Theprobes in the lowerpartof the figure occur at the column me anayindcated by the arrow when the probes 

from the other probe sets have only one of these interrogation positions. .. 

FIG. 1 7A to 17C illustrate methods which can be used to prepare single-stranded nucleic acad sequences. 
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XIII. Alternative Embodiments 

XIV. Examples 

XV. Conclusion 

/. Glossary 

The following terms are intended to have the following general meanings as they are used herein: 

1. aibaralfi: A material having a rigid or semi-rigid surface. In many embodiments, at least one surface of the 
substrate w,ll be substantially flat, although in some embodiments it may be desirable to physically separate syn 

l^ re9,0nS ? d erent P0 ' ymerS " Hh> tor exarTp,e ' wells - raised re 9 ions - e tched trenches, or the lite. In some 
ernbod.ments. the substrate itself contains wells, trenches, flow through regions, etc. which form all or part of the 
synthesis regions. According to other embodiments, small beads may be provided on the surface, and (impounds 
synthesized thereon may be released upon completion of the synthesis. 

2. Predefined Reqion : A predefined region is a localized area on a substrate which is. was, or is intended to be used 
for formation of a selected polymer and is otherwise referred to herein in the alternative as "reaction" region a 
selected region or simply a "region." The predefined region may have any convenient shape. e.g.. circular rec- 

2SfL t TT, L WG f 9e - shaped ' etc - ,n SOme embodiments, a predefined region and. therefore, the area upon 
which each d.stinct polymer sequence is synthesized is smaller than about 1 cm 2 , more preferably less than 1 mm* 

SLEnlE? ^ y 1688 *" n 0 5 mm ' ^ pr f ferred embodiments, the regions have an area less than' 
about 10.000 urn 2 or. more preferably, less than 1 00 Mm 2 . Within these regions, the polymer synthesized therein is 
preferably synthesized in a substantially pure form. m e5 . zea merein is 

3 ^S»bStantJallY Pwrp : A polymer or other compound is considered to be "substantially pure" when it exhibits char- 
axter.stcst^d.st,ngu^ 

puresuchthatrt,sthepr^ 

Z?iL P ^™ e P;l erably than 10% pure " and 01081 ^raWy more than 20% pure. According Smore 
preferred aspects of the invention, the compound is greater than 80% pure, preferably more than 90% pure and 
more preferably more than 95% pure, where purity for this purpose refers to the ratio of the number con^nd 
molecules formed in a reg,on having a desired structure to the total number of non-solvent molecules in the region. 

4- Monomer : In general, a monomer is any member of the set of molecules which can be joined together to form 
SLtSTy °r polymer ■ I Th !^ se, of monomers useful in the present invention includes, but is not restricted to for 
the example i of ohgonucleotide synthesis, the set of nucleotides consisting of adenine, thymine, cytosine guanine 
nES^J"- 0, i T U i esPeCtiVely, and SynthetiC ^^hereof. As used herein" monomers reV* toTrJ 

£ T""* ° f aP ° ,i90mer - Different sete * monome rs may be used at successive 

<o steps in the synthesis of a polymer 

5. Oligomer or Polympr : The oligomer or polymer sequences of the present invention are formed from the chemical 
or enzymatic addrtion of monomer subunits. Such oligomers include, for example, both linear, cyclic, and branched 
polymers of nuclec acids, polysaccharides, phospholipids, and peptides having either «-. p-, or a>-amino acids 
heteropolymers n wh,ch a known drug is covalently bound to any of the above, polyurethanes polyesters polS 
bonates. polyureas. polyamides. polyethyleneimines. polyarylene suffides. polysiloxanes, polyimides. polyacetates 

herein the term ohgomer or polymer is meant to include such molecules as p-turn mimetics. prostaglandins and 
benzod.azep.nes wh.ch can also be synthesized in a stepwise fashion on a solid support 5ta 9 |anains ana 

6. Efiatidff A peptide is an oligomer in which the monomers are amino acids and which are joined together through 

««t when a-amro acids are used, they may be the L-optical isomer or the D-optical isomer. Other artno acids 
which are useful in the present invention include unnatural amino acids such as p-alanine phenylolvcine 
homoarginme and the like. Peptides are more than two amino acid monomers long, and often more 
ZSEFS? ^ ^ reviationsfor «*• acids are used (e.g.. P for proline). These abbreviation Tare 

mduded ,n Stryer. Bochemetry. Third Ed.. (1988). which is incorporated herein by reference for all purpose? 
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7 Olioonucleotides : An oligonucleotide is a single-stranded DNA or RN A molecule, typically prepared by synthetic 
m SEFEZS** naturally occurring oligonucleotides, or fragments thereof, may be isolatedfrom the.r natural 
SesTpuSd from commercial sources. Those oligonucleotides employed in the P^^£^ 
I to 1 00 nucleotides in length preferably from 6 to 30 nucleotides, although oligonucleotides of different length may 

and Carruthers. Tetrahedron Lett, 22:1859-1862 (1981). or by the tnester method according to Matteucci. et * 1 
Am.rLm.Soc.. 103:3185(1981)^ 

a commercial automated oligonucleotide synthesizer or VLSIPS™ technology (discussed .n detari b^ WM 
oligonucleotides are referred to as "double-stranded." ft is understood by those of skill in the art that apau of ohg* 
nucleotides exist in a hydrogen-bonded, helical array typically associated with, for example. DNA. In addition to , tte 
^%compl ementary form of double-stranded oligonucleotides, the term "double-stranded" as used herein s also 
meant3e7to those forms which include such structural features as bulges and .oops^descrbed more fully in 
^ biochemistry texts as Stryer. Biochemistry. Third Ed.. (1988). previously incorporated herein by reference for 
all purposes. 

8 r^miral terms : As used herein, the term "alkyl" refers to a saturated hydrocarbon radical which may ^bestraight- 
chS^SS-chain (for example, ethyl, isopropyl. f-amyl. or 2.WimethylhexyQ. When W or 

used to refer to a linking group or a spacer, it is taken to be a group havung **, t™™^™™*'^** 
attachment, for example. -CH^-. -^CH 2 CH 2 CH 2 -. ^CH^CHaJCH^and ^^f^fj^ 
Preferred alkyl groups as substituents are those containing 1 to ^ 

atoms being particularly preferred. Preferred alkyl or alkylene groups as linking groups are ^containing 1 to 20 
carbon atoms with those containing 3 to 6 carbon atoms being particular* preferred. The term 
is used to refer to those molecules which have repeating units of ethylene glycol, for example, hexaethylene ^glycol 
(HO-(CH 9 CH,0) s -CH 2 CH 2 OH). When the term "polyethylene glycol" is used to refer to Unking groups and spacer 
grou^ VwouW be undent by one of ski., in the art that other polyethers or colyols could be used asweU (£ 
propylene glycol or mixtures of ethylene and propylene glycols). The following abbrevatoons are used herem. 
KSnthrenequinone diimine; phen'. 5-amido-glutaric acid-1 ,10-phenanthroline; dppz. dpyndophenazme. 

9 Protective Group : As used herein, the term "protecting group" refers to any of the groups which are designed to 
blcSSc^ 

^protecting groups used herein can be any of those groups described in Greene, et al. p f^^^" 
Oraanic Chemltry, 2nd Ed.. John Wiley & Sons. New York. NY. 1991. incorporated herein by ^erence^The proper 
wSSSnd^SS^ groups for a particular synthesis will be governed by the overall methods employed m the 

sX^Forexanp* 9 ^ 

tecting groups such as NVOC. MeNPOC. and those disclosed in co-pending Application PCT/US93/10162 : (Ned 
OctoberW 1993). incorporated herein by reference. In other methods, protecting groups may 'be removed by chem- 
ical methods and include groups such as FMOC. DMT and others known to those of stall in the art 

or ' nuJc acids, such as . for instance, between the two strands of a double stranded DNA molecule, or I between 
an oligonucleotide primer and a primer binding site on a single stranded nucleic acri to be JJJ 
Complementary nucleotides are. generally. A and T (or A and U). or C and G. Two ^/"^™^™5 
mo.Jcu.es are said to be substantia.* complementary when the nuc.eot.des ™^i£^*5^ 
comoared and with appropriate nucleotide insertions or deletions, pa.r wrth at least about 80% of the nucleotides 
SSoEr Sand, us^a.ly at least about 90% to 95%. and more preferably from about 98 to ™™T««* 
substantial complementarity exists when an RNA or DNA strand will hybridize under f^^"*™"^"^ 
tions to its complement. Typically, selective hybridization will ^^^^^^^T^T^Si 
over a stretchof at least 1 4 to 25 nucleotides, preferably at least about 75%. more preferably at least about 90% 
complementarity. See. M. Kanehisa Nucleic Acids Res. 12503 (1984). incorporated herein by reference. 

1 1 Q^n.^h yhriH.^tinncondrtions : Such conditions will typically include salt concentrations of less than about 

MmoIusuSy less than about 500 ntM. and preferably less than about 200 mM .h^bntataj 
can be as low as 5-C. but are typically greater than 22-C. more typically greater than about 30-C. and I preferab y ,n 
excess of about 37°C. Longer fragments may require higher hybridization temperatures for specif ic ^zation. 
As other factors may dramatically affect the stringency of hybridization, including ?ase corr^s*oa tengfr .0 the 
conplementary strands, presence of organic solvents and extent of base mismatching, the combination of param- 
eters is more important than the absolute measure of any one alone. 
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£|SSn e ESZZl"*" m0 ' eCU,e "** iS d6,ineated ty the ~ * interacfi0n ^ » e «**« - 

cLS? 1 ^-' ? meanS -^ er ! by 0,16 <an iderrtify which ^'ecules have experienced a particular reaction in the 
synthesis an oligomer. The identifier tag also records the step in the syndesis series in v^ich Se mdecLS 
expenenced that particular monomer reaction. The identifier teg rray be any Sizable featunTS*^^ 

^^^ 5bn9UiStm V n Sh8pe ' «** ^ity%tc.; differently -iS^JSTl 

l.gW. chermcally reactive; magnetically or electronically encoded; or in some other way distinctively mutodwm the 
requ,red .nformation. A preferred example of such an identifier tag is an oligonucleotide sequence 

14- UiancyPrDtw : A ligand is a molecule that is recognized by a particular receptor. The agent bound bv or react™ 
wrtha recytor .scalled a "Hgand."a term which is definitional* meaningful or?ht^3?£S^^2? 

that «he substance .n question ,s capable of binding or otherwise interacting with the receptor Also a BgaS mtv 
serve either « * e natural ligand to which the receptor binds, or as a functional analogue thS™^ as an^onS 
^antagonist. Ex^esofligar^stratcanbeir^estig^ 

and ^magonists for cel. membrane receptors, toxins and venoms, viral epitopes, hormones STSSi sterol 
etcj nonreceptors, peptides, enzymes, enzyme substrates, substrate analogs, transition SiSS^SSL 
ors drugs, proteins, and antibodies. The term "probe" refers to those molecute which a^eS^ed^ctlE 
Lgands but for which binding information is typical.y unknown. For examp.e. if a receptoH ^.o^t WnJ U M 
which ,s a pept.de p-turn. a "probe" or library of probes will be those molecules deseed I to rSi *Z pe JdeT 

Z^T^'ZT PartiCUlar ,i9and "* a ° iven ' ece * tor is ^el pTobe S to 

those molecules designed as potential ligands for the receptor. 

15. BfiSSPJor: A molecule that has an affinity for a given ligand or probe. Receptors may be naturallv-occurrino «, 
remade molecu.es. Aiso. they can be employed in their unaltered natural orlsolateTsLte JllSSZSS 
1? ateChed ' C0Va ' ent,y ° r to ■ binding mernber e^SSyTv^ 

i5SS SUSSES?? K Pl6S 01 r6CePt0rS *** Can * employed * «• ■"*■"«*■ include Sare not 
restarted to. antibodies., cell membrane receptors, monoclonal antibodies and antisera reactive with soecific an« 

S TZ~* (SUCh as , 0n viruses ' ce,ls » drugs. pdynucleotid* nuclei £ JSSS" 

cofactors. lectins, sugars, polysaccharides, cells, cellular membranes, and organelles Receotore are 22 

32? ,n f art - as r ti ' ,igands - * the term recept0,s is used herein - n ° S^?S^?iSS5t 

iTSE 6 ^ T Whe " *" P m ° ,eCUleS hSVe oon * lnBd threu » h ™ ,ecular recognSto £S?££ 

plex. Other examples of receptors which can be investigated by this invention include but aSnot reSricS to 

proteins o enzymes essential to survival of microorganisms, is useful in a new class of antibiotcTof^DartSr 
"^m use antibi08CS 39ainSt W * *"* Pr0t020a ' and ^ *h^^SSS; 
b) fiQiyrnes: For instance, the binding site of enzymes such as the enzymes resoonsible for cimiam n a , m 
transmitters. Determination of .igands or probes that bind to certain rm^SSS^iZ^^ 

m the treatment of disorders of neurotransmission. 

^ in t anCe ' * 8 inVenti0n be USeful in inve sti9ating the ligand-binding site on the antibodv 
molecule wh,ch combines wrth the epitope of an antigen of interest. Determining a sequence toaYmSan 
antigenic epitope may lead to the development of vaccines of which the immunogen 
t^^^Z 10 the d °* ^diagnosticagems or corr^unds useful^ 

SEE^S? ^ aut ° ,mmune diseases <•*■ ^ "coking the binding of theVelf" antTbSes) 

d) J^Aaci: The invents may be useful in investigating sequences of nucleic acids acting binding sites 

r:r~r such ™ ^ inciude - «~- 

e) Catalytic Wvnfntln>s : Polymers, preferably polypeptides, which are capable of promotina a chemical r M r 
tion involving the conversion of one or more reactants to one or more pn^ sSSTSS^lSS 

ESSfiT^rSra-*** ? 81 16384 008 ° r r6aCti0n «»*nidto and an iSJJSEXSS 

.mate to the b.nd.ng site, which functionality is capable of chemically modifying the bound reSTSSS: 
polyp^es are described in. Urner. R.A. et a,.. ^ 252: 659 (199^ J*, is £££3 '£2?? 

2 h5^S!rJ ^ *• reC6Pt0rS for inSU ' in and 9rOWth hormone - Determination of the ligands 

which b,nd wrth h.gh aff inrty to a receptor is useful in the development of. for example, an oral replaceme^ 
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the dailv injections which diabetics must take to relieve the symptoms of diabetes, and in the other ease, a 
^temSSttl scarce human growth hormone that can only be obtained from cadavers or by recomb.nant 

fre the vasoconstrictr^e hormone receptors; determination of those hgands 
that bind to a receptor may lead to the development of drugs to control blood pressure. 
^On ia tP raptors: Determination of ligands that bind to the opiate receptors in the brain is useful m the devel- 
opment of less-addictive replacements for morphine and related drugs. 

16. Synthetic : Produced by in vitro chemical or enzymatic synthesis. ^^J^^^^SiTZ 
rnay belo^rasted with those in viral or plasmid vectors, for instance. wh.ch may be propagated .n bactenal. yeast. 

io or other living hosts. 

17. Probe: A molecule of known composition or monomer sequence, typically formed on a solid surface ■ < 

or m"a7bl exposed to a target molecule and examined to determine if the probe has hybndized to the target. Also 
referred to herein as an "oligonucleotide" or an "oligonucleotide probe. 

' 5 18 laraet- A molecule, typically of unknown composition or monomer sequence, for which it is desired to study r the 
co^Sor^ , oTmonomeTsequence. A target may be a part of a larger molecule, such as a few bases ,n a longer 

nucleic acid. 

20 ' 19. A T. C. G. U : A, T, C. G, and U are abbreviations for the nucleotides adenine, thymine, cytosine, guanine, and 
uridine, respectively. 

20 ^ r.hinnr Library . A collection of oligonucleotide probes of predefined nucleotide sequence, often formed 
in one or more substrates, which are used in hybridization studies of target nucleic acids. 
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II. General Overview 



in one embodiment, the present invention provides improved methods for obtaining sequence 
nucleic acids (/.a. oligonucleotides). More particularly, the present invention provides improved ™th^s for discrim. 
30 natinTbe^e^ 

So^TeK ^part oMhe ability to synthesize or attach specific oligonucleotides at known locations on a substrate. 

sr a ^x «• ° f interactin9 wi,h specmc target mc t c f ad r 

^ VTJSS% approve fabeling of these targets, the srtes of the »«««""^ 
the specific oligonucleotide can be derived. Moreover, because the oligonucleobdes are posrtionally defined, the target 
as seauence can be reconstructed from the sites of the interactions. 

^ It tes Sw been determined that reconstruction of the target sequence can be improved by using various enzymes 

thatcSzeS^ 

bete £ U^^^U «» « ditter * one ° r m ° re b ~ ™ be ^ * 9 

various enzymes that catalyze oligonucleotide cleavage and ligation reactions. 

RNieTt^atment, for exanple. can be used to improve the quality of RNA hybndization s.gnals on , hgh density 

ol in Q nuSoiraTOvs After ttie^rray of oligonucleotides has been combined with a target nucleic aad (RNA) to form 

noToerS; complementary target-oligonudeotide hybrid complexes. RNase A recogn.zes and ^s-ngte-strarKled 
S2t32n^1^ D W that is not in a perfect double-stranded structure. As illustrated .n FIG. 1 RNA 

bTesto^ can be recognized and cleaved by RNase A. Similarly, treatment wrth 

o^S^n I nuclease and Mung Bean nuclease) can be used to improve the DNA hybnd.za ton s.gna.s on 
cSSTSSSdl^ W As sucS. nuclease treatment can be used to improve the quality of hybrdization 
on Ngh StToligonuclitide arrays and, in torn, to more accurately deterrrtne the sequence, or monrtor 

^o^ 

by XZZZSZZS. T4 DNA ligase. for example, can be used to identify DNA:DNA ^s Jal are pertocrty 
SrSmentary near the 5' end of the immobilized oligonucleotide probes. The ligation reaction of labelled, short ohgo- 
nuSL to me S end of oligonucleotide probes on a substrate will occur, in the presence of a hgase^only when a 

2e is a SSH r ove^ang of the target to serve as a template for hybridization and ligation. As such, after the array 
o^XnucSts has been , combined ! with a target nudeic acid to form target-oligonucleot.de hybrid complexes, the 
iSSSS^IS^SbM complexes can be contacted with a ligase and a labelled, ligatab ^f^^^; 
SftheSon reaction the substrate is washed to remove the target nucleic acid and labelled un gated o .gonude- 
o\tie probS ^^T^e oSonudeotide probes containing the label indicate sequences which are perfectly complementary 
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Si r ?!L" UCleiCaCi ?! eC,UenCe - ASSUCh " asillustrated ^ FIG. 2Jigatonreartonscan be used toinprove discrimination 
aCe *" * *" " ^ mismat£ * es « poorly discriminaTed following hyS 

h» additi0n to P^'" 9 impr0Ved m6th0ds for eB8crim »™«8no between fully complementary hybrids and those that 
drfferby one or more base pa.rs, the present invention provides methods for the preparation of high-densrty arraysof 
d^erse un.molecular and intramolecular double-stranded oligonucleotides, as well as arrays of confteraUmal? 

lESSTS bl i 0ad l CO w CeP * * "* h iS il,UStrated in Fia 9 - F,GS - 9A - 96 an * « H^ateTe pC2 
of surface-bound ummolecular double stranded DNA. while FIGS. 9D, 9E and 9F illustrate uses for the libraries of the 

present invention. 

^I'S ?i 8 S0 '? ^ POrt 1 having an attached spacer 2 - which is OP* 003 '- Attacn «J to the distal end of the 
spacer » a first oligomers, whch can be attached asasingle unit or synthesized on the support or spacer in a monomer 

th.ZTf T?" I' G - 96 ^ 3 SUb5eqUent Sta9S in *» preparation * one - aTbSy aSg to 

th! SIIIT V T Sta9e * 3 f leXib,e Hnker 4 iS attached to the distal end of the oligomer 3. In other embodSieSte 

SoEm^S^ 3 P T °- 90 ShOWS ^ suriace - bound unimolecular double strande^A™^ 

As shown ,n FIG 9C. the length of the flexible linker (or probe) 4 is sufficient such that the first and second oligomers 
h^ L^r erta ? B d , oub,e - strand ^ conformation, ft will be appreciated by one of ski., in the S 

« T present invention will contain muttiple. individually synthesized members which can be screened for 
various types of activity. Three such binding events are illustrated in FIGS 9D 9E and 9F. «™enea ror 

it io^'^?; a hT Ptor n?**- CQnbea Pr ° tein ' RNA m0,eCU,e ° r °* her mo,ecu,e whicn is k" 0 *" to bind to DNA, 
usl^r dtn J" H ary - Delerm,nin9 which member ° f a «»* binds to the receptor provides information which is 

S9 5 UenC,n9 or RNA - Wertif y in S drugs and/or proteins that bind DNA, identifying 
genetic characteristics, or in other drug discovery endeavors. 

restrlrtlTmfnnrh^^.t ^ information is ""a* Prdbe is held in a conformational^ 

restricted manner by the flanking oligomers 3 and 5, which are present in a double-stranded conformation As a result 

foi^probr formatona,,yre ^ 

ntiHo?^' 656 ,"*,^ 6 "*?" alS ° contem P |a,es *» Preparation of libraries of unimolecular. double-stranded oligonucle- 

SnnSK t 5 % ° f *• 38 depiCled in F,G 9F ln F,a 9F ' one oligonucleotide 5 is sSI as 

having a bulge 8. Specrfic RNA bulges are often recognized by proteins (e*., TAR RNA is recognized by the TATproteln 
of HIV). Accordingly. Iibrar.es of RNA bulges or loops are useful in a number of diagnostic applications One of rt? n 
the art will appreciate that the bulge or loop can be present in either oligonucleotide portion! or! 

ol.gonudeot.des. The broad concept of this aspect of the invention is illustrated in FIG. 10. As with the above described 
un.molecular" aspect of the invention. FIG. 10A shows a solid support 11 having an attached soaceMZ Sht 
£ to distal «* °< *e ^er is a first oiigomer £ ^J^'SSSl ^g e u^it o syn 

thesized on the support or spacer in a monomer by monomer approach. FIG. 10B shows a subsequent sfawhTe 
preparation of one member of a library according to the present invention. In this stage, a s^oKrU wh"ch 
complementary to the first oligomer 1 3. is attached to the solid support. The second oligomer cm SSlmSSm 
40 ".ngleunrtorsyrthesiz^ 

oligomers are synthesized on the solid support in a protected form. Removal ofSe pn^^^pESSS 

2"? T m T f ° li9 ° mere " dOSe ProXimity " t * sh ran form a I intermofeLlaf 

double stranded digonucleotide (FIG. 1 0C). FIG. 1 0D shows one member of a library in which the first Z!S 

« TJZZZV VAAAAAITTTW and its identical neighboring oligomer is 3'-TTTTTAAAAA-5\ In other eSmTte 

tJrm ini a TS ,he ^^mplementary oligomers wi.l exhibit complementarity only over their respect 

termm, as shown ,n FIG. 10E. It will be appreciated by one of skill in the art. that the libraries of the present invention 

w,n contain multiple. .ndivWually synthesized members which can be screened for various types of iSSTJSSSt 

serve as templates for hybridization enhancement. 

so III. Methods For Generating An Array Of Oligonucleotides On A Substrate 

A. The Substrate 
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In the methods of the present invention, an array of diverse oligonucleotides at known locations on a single substrate 

orotic ^ SSent ? a " y ' ^ C ° nCeiVab,e SUbStr3te 080 be emp,0yed in *• invenfon - 1* sSatelan te 

organic, norganic. biological, nonbiological. or a combination of any of these, existing as beads particles s5£* 
precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, slid* % Z S 
can have any convenient shape, such a disc, square, sphere, circle, etc. The substrate is preferaSy L, S 
on a vanety of atternative surface configurations. For example, the substrate may contain raised or dep^eSedTS 
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on which the synthesis takes place. The substrate and its surface preferably form a rigid support on which to carry out 
the reaction described herein. The substrate and its surface may also chosen to provide appropriate light-absorbing 
characteristics The substrate may be any of a wide variety of materials including, for example, polymers plastic^pyrex, 
quartz, resins, silicon, silica or silica-based materials, carbon, metals, inorganic glasses, inorganic crystals. ^rrtDranes, 
eta More particularly, the substrate may. for instance, be a polymerized Langmuir Blodgettfilm. functionalized glass. S . 
Ge GaAs GaP SiO,. SiN 4 . modified silicon, or any one of a wide variety of gels or polymers such as (poly)-tetrafluor- 
otheylene. (poly)vinylidenedifluoride, polystyrene, polycarbonate, or cornbinations thereof. Clhers^aterraterjs will 
be readily apparent to those of skill in the art upon review of this disclosure. In a preferred embodiment the substrate is 
flat glass or single-crystal silicon with surface refief features of less than 10. 

In some embodiments, a predefined region on the substrate and. therefore, the area upon which « a ^ n « 
rial is synthesized will have a surface area of between about 1 cm 2 and lO^cnJ. In some embed mients. ^ regions 
have areas of less than about 10 W. 10W. loW. 10W. 10W. 10W 10'W, 10"W. or 10 W. 
In a preferred embodiment, the regions are between about 10X10 jim and 500x100(im. 

Moreover in some embodiments, a single substrate supports more than about 10 different monomer sequences 
and preferably more than about 100 different monomer sequences, although in some embodiments more than about 
10 3 10 4 10 5 10 6 1 0 7 or 10 8 different sequences are provided on a substrate. Of course, within a region of the substrate 
in which a monomer sequence is synthesized, it is preferred that the monomer ^"^^"^^"^ 
embodiments, regions of the substrate contain polymer sequences which are at least about 1%. S%, 10%, 15%. zo%. 
25% 30% 35%. 40%. 45%. 50%. 60%. 70%. 80% 90%. 95%. 96%. 97%. 98%. or 99% pure. 

As previously explained, the substrate is preferably flat, but may take on a variety of alternative surface configura- 
tions Regardless of the configuration of the substrate surface, it is imperative that the reactants used to generate an 
array of oligonucleotides in the individual reaction regions be prevented from moving to adjacent rearton regions. Most 
simply this is ensured by chemically attaching the oligonucleotides to the substrate. Moreover, this can be ensured by 
providing an appropriate barrier between the various reaction regions on the substrate. A mechanical device or phys.cal 
structure can be used to define the various regions on the substrate. For example, a wall or other physical barrier can 
be used to prevent the reactants in the individual reaction regions from moving to adjacent reaction regions. Alternatively, 
a dimple or other recess can be used to prevent the reactant components in the individual reaction regions from moving 
to adjacent reaction regions. 

30 B. Generating An Array Using Light-Directed Methods 

An array of diverse oligonucleotides diverse oligonucleotides at known locations on a single substrate surfaces can 
be formed using a variety of techniques known to those skilled in the art of polymer synthesis on sold supports For 
example "light directed" methods (which are one technique in a family of methods known as VLSIPS™ methods)are 
described in U S. Patent No. 5.143.854, previously incorporated by reference. The light directed methods discussed in 
the '854 patent involve activating predefined regions of a substrate or solid support and then corrta cting substrate 
with a preselected monomer solution. The predefined regions can be activated with a ,| 9W s ^ rcesn ^*;°^ i ^ 
(muchin the manner of photolithography techniques used in irtegratedcircurtfabricaton). Other regions of tte 
remain inactive because they are blocked by the mask from illumination and rema.n chemically protect©! Thus, a hgh 
pattern defines which regions of the substrate react with a given monomer. By repeatedly activating different sets of 
predefined regions and contacting different monomer solutions with the siibstrate. a diverse array of polymers is produced 
on the substrate. Of course, other steps such as washing unreacted monomer solution from ^substrate can be used 
as necessary. Other techniques include mechanical techniques such as those described in PCT No. 92/101 83J USSN 
07/796 243, also incorporated herein by reference for all purposes. Still further techniques include bead based^ tech- 
niques such as those described in PCT US/93/04145, also incorporated herein by reference, and pin based methods 
such as those described in U.S. Pat No. 5.288.51 4, also incorporated herein by reference. 

The VLSIPS™ methods are preferred for generating an array of oligonucleotides on a single substrate. The surface 
of the solid support or substrate can be optionally modified with spacers having photolabile protecting groups such as 
NVOC and MeNPOC. is illuminated through a photolithographic mask, yielding reactive groups (typically hydroxy I groups) 
in the illuminated regions. A 3 -O-phosphoramidite activated deoxynucleoside (protected at the y-hydroxyl with a pho- 
tolabile protecting group) is then presented to the surface and chemical coupling occurs at sites that were exposed to 
light Following capping, and oxidation, the substrate is rinsed and the surface illuminated through a second mask to 
expose additional hydroxy! groups for coupling. A second S'-protected. S'-O-phosphoramidite a ^ v *^°** nu * TO f e . 
is presented to the surface. The selective photodeprotection and coupling cycles are repeated until the desired set of 
55 oligonucleotides is produced. 
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B. Generating An Array Of Oligonucleotides Using Flow Channel Or Spotting Methods 

In addition to the foregoing, additional methods which can be used to generate an array of oligonucleotides on a 
single substiate aredescribed in co-pending Applications Ser. No. 07/980,523. filed Novembers. 1 992. and 07/796 243 

Hed November 22. 1991. incorporated herein by reference for all purposes. In the methods disclosed in these applica- 
tions, reagents are delivered to the substrate by either (1) flowing within a channel defined on predefined regions or (2) 

spotting on predefined regions. However, other approaches, as well as combinations of spotting and flowing may be 
employed. In each instance, certain activated regions of the substrate are mechanically separated from other regions 
when the monomer solutions are delivered to the various reaction sites. 

' h A 1 ^ iCa ' 2!f W Channe '" meth0d applied to the com P°" n ds and libraries of the present invention can generally be 
descnbed as follows. Diverse polymer sequences are synthesized at selected regions of a substrate or solid support by 
terming flow channels on a surface of the substrate through which appropriate reagents flow or in which appropriate 
reagents are placed. For example, assume a monomer "A" is to be bound to the substrate in a first group of selected 
regions. H necessary, all or part of the surface of the substrate in all or a part of the selected regions is activated for 

. bind.ng by, for example, flowing appropriate reagents through all or some of the channels, or by washing the entire 
substrate wrth appropriate reagents. After placement of a channel blockon the surface of the substrate, a reagent having 
the monomer A flows through or is placed in all or some of the channel(s). The channels provide fluid contact to the first 
selected reg.ons. thereby binding the monomer A on the substrate directly or indirectly (via a spacer) in the first selected 
regions. 

Thereafter, a monomer B is coupled to second selected regions, some of which may be included among the first 
selected reg.ons. The second selected regions will be in fluid contact with a second flow channels) through translation 
rotation, or replacement of the channel block on the surface of the substrate; through opening or closing a selected 
valve; or through deposition of a layer of chemical or photoresist If necessary, a step is performed for activating at least 
the second regions. Thereafter, the monomer B is flowed through or placed in the second flow channels) binding 
monomer B at the second selected locations. In this particular example, the resulting sequences bound to the substrate 

2 Hol 5 ^. 6 °i P K 0 f l ! SinQ b6 ' fof example ' A " B ' and AB " The process is reDeated to form a ^st array of sequences 
of desired length at known locations on the substrate. 

«, Af !f ' .u 6 su ^ rate is activ ated. monomer A can be flowed through some of the channels, monomer B can be flowed 
through other channels, a monomer C can be flowed through still other channels, eto. In this manner, many or all of the 
reaction regions are reacted with a monomer before the channel block must be moved or the substrate must be washed 

^IT 3 . V makinS USe ° f many 0r *" of the available reaction re S ions simultaneously, the number of washing 

and activation steps can be minimized. u 

One of skin in the art will recognize that there are alternative methods of forming channels or otherwise protecting 
a portion of the surface of the substrate. For example, according to some embodiments, a protective coating such as a 
hydrophjc or hydrophobic coating (depending upon the nature of the solvent) is utilized over portions of the substrate 
to be protected, sometimes in combination with materials that facilitate wetting by the reactant solution in other regions 
In this manner, the flowing solutions are further prevented from passing outside of their designated flow paths 

The spotting" methods of preparing compounds and libraries of the present invention can be implemented in much 
the same manner as the flow channel methods. For example, a monomer A can be delivered to and coupled with a first 
group of reaction regions which have been appropriately activated. Thereafter, a monomer B can be delivered to and 
reacted with a second group of activated reaction regions. Unlike the flow channel embodiments described above reac- 
tants are delivered by directly depositing (rather than flowing) relatively small quantities of them in selected regions In 
some steps, of course, the entire substrate surface can be sprayed or otherwise coated with a solution. In preferred 
embedments, a dispenser moves from region to region, depositing only as much monomer as necessary at each stop 
Typ 1C al dispensers .ndude a micropipette to deliver the monomer solution to the substrate and a robotic system to control 
the position of the micropipette with respect to the substrate. In other embodiments, the dispenser includes a series of 
shSane^sly *" * " ^ 50 VariOUS *" tielWer "* to ™<*on regions 

C. Generating An Array Of Oligonucleotides Using Pin-Based Methods 

Another method which is useful for the preparation of an array of diverse oligonucleotides on a single substrate 
involves pn based synthesis." This method is described in detail in U.S. Patent No. 5.288.514. previously incorporated 
herein by reference. The method utilizes a substrate having a plurality of pins or other extensions. The pins are each 
inserted simultaneously into individual reagent containers in a tray. In a common embodiment, an array of 96 pins/con- 
loiners is utilized. 

Eachfrayisfilledwrtharjarfcularreagentfor Accord- 
ingly, the trays will often contain different reagents. Since the chemistry used is such that relatively similar' reaction 
conditions may be utilized to perform each of the reactions, multiple chemical coupling steps can be conducted simul- 
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taneously In the first step of the process, a substrate on which the chemical coupling steps are caTducted .sprov,ded 
^e subsfrate is optionally provided with a spacer having active sites. In the particular case of oligonucleotides 
exarS the siceT may be selected from a wide variety of molecules which can be used .n organic erwonmente 
SscSed I wSthesis as well as in aqueous environments associated with binding stud.es. Examples of suitable 
SS s S yeS^eglycols, dicarboxylic acids, polyamines and alkylenes. substituted w,th. for example, methoxy 
fn^x TS ^onaH** spacTrs wil. have an actrve srte on the distal end. The active <^ J"*"^ 
oVrtected inLlly by protecting groups. Among a wide variety of protecting groups which are useful are FMOC^ BOC. \r 

Tso/W Phase Peptide Synthesis. IRL Press (1989). incorporated herein by reference. In some embedments, the 
spacer may provide for a cleavable function by way of, for example, exposure to acid or base. 

D. Generating An Array Of Oligonucleotides Using Bead Based Methods 

In addition to the foregoing methods, another method which is useful for synthesis of an anay of digonudeotides 
i JveSS based synLsis." A genera, approach for bead based synthesis is ^^^^SSSZ 
Ser Nos. 07/762.522 (filed September 18. 1991); 07/946.239 (fled September 16. 1992); ^8/146.886 ("^^rnber 
2 *998); 07/876.792 (filed April 29. 1992) and PCT/US93/04145 (filed April 28. 1993). the disclosures of wh.ch are 

* B TE^f£5m such as o.igonuc.eo«des on beads, a large P .ura,ity of beads are suspend in , a 
surtable carrier (such as water) in a container. The beads are provided with optional spacer molecules having an active 
site The active site is protected by an optional protecting group. . ^ 

Ina fSt step of the synthesis the beads are divided lor coupling into a plurality of containers. For the Purposes* 
this brief description, the number of containers will be limited to three, and the monomers denoted as C D. E and 
F The protecting groups are then removed and a first portion of the molecule to be synced ,s added to each of the 
three containers He A is added to container 1 , B is added to container 2 and C is added to container 3). 

TnlSeTthe various beads are appropriately washed of excess reagents, and remixed in one container. Again, 
it wil^e SSnizedSy virtueof the^ge number of beads utilized at the outset, there wiHsimil^ 
of^beads^domly dispersed in the container, each having a particular first portion of the monomer to be synthesized 

° n ^tSe^ various beads are again divided for coupling in another group of three containersJThe beadsin the 
first conTa^e^ 

7etZ™*™Zu>e portions E and F. respectiveiy. Accordingly, molecules AD BD and C > w,l. be ^esent ;« the 
first container while AE BE and CE will be present in the second container, and molecules AF. BF. and CF wil be 

confine;. Each bead. hJever. wil. have only a sing.e type of mdecule « * 
the possible molecules formed from the first portions A. B, C, and the second portions D£ ta«r formal 
The beads are then recombined into one container and additional steps are conducted to complete the synthesis 
of thep^rymeVmdecules. In a preferred embodiment, the beads are tagged with an identifying tag wh.ch .s unique to 

« is proved in co-pending Application Ser. No. 08/146.886 (filed November 2,1993). previously incorporated 
40 by reference for all purposes. 

IV. Sequencing By Hybridization Using the Probe Tiling Strategy 

Using the VLSIPS™ technology described above, one can generate arrays of immobilized probes which can be 
« used to compare a reference sequence of known sequence with a target sequence showing , sub*ar*a, ,m arr* wrth 
the reference sequence, but differing in the presence of. for example, mutations. In fact. WO 95/11995 ^«*"0» 
Swhichareinco^ 

leva Jf entire genomes, chromosomes, genes, exons or introns. or it can focus on .ndrvdua. mutant srtes and 
50 TrnmedSe y accent bases. The strategies allow detection of variations, such as mutations or polyorchism*. . , the 
ter^u^nce Respective of whethe? a particular variant has previously been characterized. The strateg.es both 
define the nature of a variant and identify its location in a target sequence. . 

4e st^eaies employ arrays of oligonucleotide probes immobilized to a solid support. Target sequences are ana- 
,«ed by d^^rtne extent" of hybridization at particular probes in the array. The strategy in se.ect.on ol probes 
55 Ses olSLon between perfectly matched probes and probes showing single-base or other degrees of m,s- 
ma Th. Segy usually entails sampling each nucleotide of interest in a target sequence several tmes. thereby 
a?hSrh"hdSeeof confidence in rts identity. This level of confidence is further increased Iv™^*^ 
nSSes in the target sequence to nucleotides of interest. The tiling strategies disclosed in WO 95/11995 result .n 
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sequencing and comparison methods suitable for routine large-scale practice with a high degree of confidence in the 

S6QU6T1C6 Output. 

A Selection Of Referenc e Sequence 

The arrays are designed to contain probes exhibiting complementarity to one or more selected reference sequence 

S I r^^Jrr"- 6 Sre US6d to read a teraet sequence wmprisinQ ether the reference sequence 
rtseff or vanante of that sequence. Target sequences may differ from the reference sequence at one or more positions 
S?<SET 1 9 1 degree of sequence identity with the reference sequence (e.g.. at least 75. 90. 95. 99 99.9 or 
^S?- ^P 0, y nucleot,d « of known sequence can be selected as a reference sequence. Reference sequences of 
interest include sequences known to include mutations or polymorphisms associated with phenotypic changes having 

SSi? ^ a " Ce ". PatiemS - F ° r examp,e • me CFTR gene "« P53 9 ene in humans "ave been kfentified a? 

S^l^T mutahons resulting in cystic fibrosis or cancer respectively. Other reference sequences of interest 
.nclude those that serve to .dentrfy pathogenic microorganisms and/or are the site of mutations by which such microor- 
15 gan.sms acquire drug resistance (e.g.. the HIV reverse transcriptase gene). Other reference sequences of interest 
include regions where polymorphic variations are known to occur (e.g.. the D-loop region of mitochondrial DNA) These 
reference sequences have utility for, e.g.. forensic or epidemiological studies. Other reference sequences of interest 
include p34 (related to p53). P 65 (implicated in breast prostate and liver cancer), and DNA segments enc^ing cWo 
chromes P450 and other biotransformation genes (seeMeyer et al.. Pharmac. Then 46, 349-355 (1 990)). Other reference 

(e.g.. V2V. HSV-1, HAV-6, HSV-II. and CMV, Epstein Barr virus), adenovirus, influenza virus, flaviviruses echovirus 
rhmovirus, coxsackievirus, cornovirus. respiratory syncytial virus, mumps virus, rotavirus, measles virus, rubella virus" 
parvovirus vaccma virus. HTLV virus, dengue virus, papillomavirus, molluscum virus, poliovirus. rabies virus. JC virus 

* « iSSJSgS^S? er reference sequences which <*" be analyzed usins theti,in s — » 

01 a refe 4 rence sequence can vary widely from a full-length genome, to an individual chromosome, epi- 
some. gene component of a gene, such as an exon. intron or regulatory sequences, to a few nucleotides A reference 
SmJo^"^ 2,5, 10 ', 20 ' X - 100 - 5000 - 1000 - 5 - 000or1 ".000.20.000 OT 100.000 nuclides is common. 

so ! * ^ Z re9 '° nS ° f 3 SequenCe {e a - 6X0,18 01 a 9ene) are <* interest ln such Nations, the particular 

30 regions can be cons,dered as separate reference sequences or can be considered as components of a single reference 
sequence, as matter of arbitrary choice. s reierence 

otid- r R f S r A e ™n S M e ? U r Ce 030 3ny natura " y occurrin 9. mutant, consensus or purely hypothetical sequence of nucle- 

^S'or^n, ^ w ""P* T" 665 030 b6 0btained fr0m oon * utar ** bases - PuWicatior«or can be deter- 
mined or conceived de novo. Usually, a reference sequence is selected to show a high degree of sequence identity to 

s^l?* 9 ^^^^ 

sequences, more than one reference sequence is selected. Combinations of wildtype and mutant reference seauences 
are employed in several applications of the tiling strategy. reierence sequences 
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B. Array Design 

1. Basic Tlllnc Strategy 



dpJ?^ 9 ^5 f 3n 3rray 01 immobilized to ' analysis of target sequences showing a high 

trS £ ?- °" e " ^ Se,eCted referenCe The strategy is 1M illustrated for an array 

2L^ y r P t« P ^ PrC *f 861 C ° mpriSeS 8 P,urality 01 probes P erfe « complementarity with a 

nSl n Seq T C6 - 1716 ^ COmplementarit y — * throughout the .ength oftne probe. However 
probes having a segment or segments of perfect complementarity that is/are flanked by leading or trailing sequences 
acking comp.ementanty to the reference sequence can also be used. Within a segment of complementarity 2££ 
inthe f ,rst probe set has at least one interrogation position that corresponds to a nucleotide inThe referen^uenc? 
£ « 03 P0Srt '° n ' S a "? ned ^ the eom *P° ndi "0 nucleotide in the reference sequence when the 

Ztf^n^T SeqUe ,? 0e ^ a * Q T 10 ^ imize ^ le ^ nter »y between me two. ffaprobe has morethan one 
interrogation position, each corresponds with a respective nucleotide in the reference sequence The identity of m 
interrogation position and corresponding nucleotide in a particular probe in the first probe set cannot be detuned 

IS^^TlS V COmPa ^ ° f ^ h ** ^ S6t 3nd "ending proSs from 

In principle, a probe could have an interrogation position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions provide more accurate data whS locaSay She e.2 
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S^^^mS"' WKrooaio " posil " m ^ 0 ° n,a " , ' s,n9 ^ 

tion at or near the center of probe. 

For each probe in the first set. there are. for purposes of the present illustration, up to three corresponding p robes 
For eacn proDe n we . » . k fo ^ e s corresponding to each nucleotide of interest 

o^eotlmerS Saly. the probes Iron. Ire Inree additional probe sets are identicaMo the corresp^g^e^ 

sss s^^etpJin, p"*^««" •» »~ sas .r ^".rs ssssr^ 

^M=fnr,s^ 

arec^bfo^Sn^wCSgetnudeicsequence 

compatible with the preferred chemistry for solid phase synthesis of oligonucleotides^ _ additiona , Drobe sets) 
ThP number of orobes in the first probe set (and as a consequence the number of probes in addrtional probe sets) 

XaXla^^oni^ 

S^XXSwre^orvbSalzaton signals ..(ourprob^tevSr,, derogation pos^soorrespond^g 
""S ^^^SeT^nucleoM. Is C Merest. In ■*« reference seguer«es. only cert* portions 

p^o^e^n^tasTs^by^e^mission of a 3' base commentary to the reference sequence and the acqu.srt.on of a 5 base 
complementary to the ^^^^^^{ e g ^ However, often only a relatively small proportion 
, ^SISSST S?l« S?S SS2Tl£f iuLr of probes of a given .ength are selected to pursue 
1' parS.a "inTS For exa^'.e a cor^Jete set of octomer probes comprises 65.536 probes; thus, an army of 

a particular wing siraiegy ru ^ nrtomer Drobes A corrplete array of decamer probes comprises 1 .048,576 
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lower limit of 25, 50 or 100 probes and an upper limit of 1,000,000. 100.000, 10.000 or 1000 probes. The arrays can 
have other components besides the probes such as linkers attaching the probes to a support * 

in thf ^"S^w *!. ™ 01 ° n,y 8 prOp0rti0n rt 811 P** 8 P"*" 8 rt a 9 iven le "3»h incite: (i) each position 
s ^1 ^ y J? y ,nformat,ve - « hath « w not hybridization occurs; (ii) nonspecific hybridization is min mlzed f°0 it fe 
5 stra,grrtforward to correlate hybridization differences with sequence differences particularly vvith re er^to S tlbM 

■zation pattern of a taiown standard; and (iv) the ability to address each probe W^^SSE^^ 

For conceptual simplicity, the probes in a set are usually arranged in order of the sequence in a lane across the 

STncf "^The ro^ represent or tile across, the selected reference sequenc'e 

[see. pio. 13). The components of the four sets of probes are usually laid down in four parallel lanes collectively con- 

252 3 T "\T e T° ntal dir6Cti0n and 8 series <* 4 " member co,u ™ s in the vertical direS 
probes from the four probe sets (/.e.. complementary to the same subsequence of the referenceTSuSSy a 
column. Each probe in a lane usually differs from its predecessor in the lane by the omission of a EESfSaZZ 
"I 'Jl a T 0nal ^ 8t thS ^ «* 88 Sh0wn in FIQ - 13 - Howeve " this «J^pS£S of pTes £ 
iSSSSSJ* '""It 0 " ° f ° 0nW ^ ° r 0miSSi ° n ° f pr0bes in certein the 
to *e a^ * ^ " ^ baCk9r ° Und ' indude - W "onidf 

^ A for^antfa!^ ? hn- V U * ^ a " ^ h " ,nB 8n ^^ion position occupied by an 

intZZZ J! % P 9 a " ,nterro 9 a t' on position occupied by a C form a C-lane. all probes having an 

nS P T K J r ( T m° ^ 8 G - ,an8, 8nd a " pr0bSS havin9 30 -terrogation position ^pX a ^ 

(or U) form a T lane (or a U lane). Note that in this arrangement there is not a unique correspondence between orobe 
sete and lanes. Thus fre probe from the first probe set is laid down in the A-lane. Sane. A^n TaTJ ?S £££ 
the f,ve columns m FIG. 14A. The interrogation position on a column of probes corresponds to the DrtHonh 11 t a ™ 
sequence whose identity isdeten.inedfromara.ysisofh^^^ 

SSSi ^ F,G - 14A> ^ interr °9 ation P 08 ^ «• be anywhere in a probe but is usual* at oT nS *e 
m^tJT? 6 , *Z* t0 m8Ximize different, ' al signals between a perfect match an5 a Sole^e 

nnsmatch. For example, for an 1 1 mer probe, the central position is the sixth nucleotide 

nJ^S! T * Pr0beS iS USU8,ly ,8id ^ inr ^ a ^ c °'^ as described above, suchaphysicalarranae- 
deff^m Thf °I ^ 18 "? eSSenti8L Pr ° Vided th8t 106 S P atial ,ocat ion of each probe in an rSKSTR 

!Si^olfti^™ C ^ ^ PrOCeSSed t0 yi6,d the UKtm ° f 8 ^ ^respectrve of theXS 
b^S^l I« ^ f 7 8y ln process,n 9 the data . *e hybridization signals from the respective probes can 

proS o" thl array y * **" ** ** redUCti0n Wh8tever the ""A arran^r^ 

«J1 fan9 ! * ,en9thS * Pr ° bes 080 be ^P' 0 ^ in *e arrays. As noted above, a probe may consist exclusively of a 
" "STX",™* ^ ° ne ° r m ° re ^P'^tary segments uxtaposed^ SCS* 
Z££»£^ the total length of complementary segrSnt(s) is more fcjSShSE 

toX^h-iES!' ! I™ f f • com P lementar V segment(s) of the first probe sets should be sufficiently lono 
SEE??? to hybnd I2 e detectably more strongly to a reference sequence conpared with a variant of the rS2 
roln 9 !1 9 6 6886 mUt8t, ° n 81 ** nUC,e0,ide ^^sponding to the interrogationposrtion of the probe SSartTS 
Sto S Tr^S Correspondin 9 P robes from a ^itiona. probe sets shSd be suffTde^Lg^ow a 
pr*etor^r.d,zedetectablymorestronglytoavaria^ 

at the interrogator) posrton relative to the reference sequence. A probe usually has a single c^mpTe^e^ se^me^ 

dmplri^n n «h K f ^ 8SeS blt,n9 P erfect complementarity (other than possibly at the interrogation postonfs) 
DlTme^tvT^ t 6 "I* 0 ^ referenCe SeqUenCe - brid9in9 more than one segnS J ^com 

anT^mhTn^ ' "? S69 ": ent 8t le8St *~ ^'^entary nucleotides to the reference sequel 

thToTh !? 6ntS Pr0V,de 8t leaSt ^ Se9ments 01 th ree or a total of six complementary nucleotide^ 2*£ 

^S!?^, ,, !-? ,,,, 5 ,,d l6n9th ° f com P ,ementef y « typically from eSo nucleotide IX^e^y 

SrTSTSi p } ave an ^ number 01 50 ^ an irterro9a,ion posffion « SS3 

the or'ohpTirft^' a " Pr0bSS 8re ? 6 58,1,6 ' 6n9th - ° ther arays em P'°y dif,erent 9 r °^ of probe sets, in which case 
the probes are of the same Sl ze withm a group, but differ between different groups. For example some arm» hal^! 
group compHsmg four sets of probes as described above in which all the art n^s \^ ZTs!^ 

ZT™ ' T 6 COnta "' **' four 9r0ups of P robes havi "9 *» <>f 1 1 ™rs. 13 mers^S WS^S 
mers. Other arrays have different size probes within the same group of four probe sets In hese arravs i^SL • 
the ft* set can vary in length independentiy of each other. Prob'es^ the othe'r S^^^IS^ 
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nrnh* ocmnvino the same column from the first set. However, occasionally different lengths of probes can be included 
ESSE? ^ 

from probes irrespective of whether A-T or C-G bonds are formed at the mterrogaton posrtion. 

L length of probe can be important in distinguishing between a perfectly matched probe and 
*£S2S*nZZ with the tan^sequence. The discrimination is usually ^^^^S^ 
Irl ...uiallv also less susceptible to formation of secondary structures. However, the absolute amount of target sequence 
bound ^^S^^k g«rtr for larger probes. The probe length representing the optimum compromise 
Swee mes T^mc^Li^ may vary depending on. inter alia, the GC content of a particular region £ I the 
te^ DN7s^ence secondary structure, synthesis efficiency and cross-hybridization. In some regions of the target 
SSXSSSlln conditions, short probes (e. 9 ..n mers) may provide 

lonoer orobes (e g 19 mers) and vice versa. Maximum sequence information can be read by including sever* I groups 
Sp«nSer!'Drobes on the array as noted above. However, for many regions of the target sequence, such a strategy 
^ the same sequence is read multiple times ^^SS^^S^ 

Equivalent information can be obtained from a single group of different s.zed probes in whu* the sizes are selected I to 
m^Sze reSa^ sequence at particular regions of the target sequence. The strategy of customing probe length 

minimizes L total number of probes required to read a par.cu.ar target sequence. 
This leaves ample capacity for the array to include probes to other reference sequences irterroaa tion 

S tS array to the re?erence sequence or the mutant form of the reference sequence identifies the probe length and 
interroaation position providing the greatest differential hybridization signal. 

TrSterro'gation pos^n in probes for ana.yzing different regions of a terget s^nceoffers a number^f 
arivarrtaoes If a seament of a target sequence contains two closely spaced mutations, ml. and m2. and probes tor 

^ThU an inLogSon potion at or near the mfldle. then 
aligned with one of the mutations without flapping the other mi^ 

5 ^end'of theprobes. probes can have their interrogation posrtion a.,gned wrth m2 without 

also offers the advantage of reducing loss of signa. due to se-f-anneaHn^ £ 
rprJnTobes FIG 14C rfiows a target sequence having a nucleotide X. which can be read either from fte relatve 

^SoThc^^^^^ 

Ihe^noncoding strand. Thus, combination of the information from cod.ng and noncod.ng strands increases the overall 

muta tons are expense ww. principle can be extended to provide arrays containing groups of 

SarTof a tiled array as noted above, but rather serves as probe(s) for a conventional reverse dot blot. For example, the 
onience S mScJnbe detected from binding of a target sequence to a single oligomers probe harboring the 
K £££ additional probe containing the equivalent region of the wildtype sequence ,s included as a 
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control 



though only a subset of probes is required to analyze a particular target sequence, it is quite possit* . that other 
probed su^rtSs to the contemplated analysis are also included on the anay. In the extreme case, the array could 
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^ST2 P 3 9iven l6n9th "^"standing that only a small subset is required to analyze the 

a complete set of probes offers the advantage of including the appropriate subset of probes for analyzing any refererS 

s EHE££ 9 T t 80 a "°? Simu,teneous an ^ s - a ^ferenoe sequence from differenfsS S£E 
5 {e.g., subsets having the interrogation she at different positions in the probe). 

reference s^^^ " 8 "I reVealS *" terget is *» «■"• or diff ^nt from the 

JSHSJf I . I 8 Same " a " pr0beS in 1,16 ,iret probe set show a stron 9er hybridization signal than 
SSSV 1 ?" , . 6r Pr ° be ^ " th6 ^ diflerent - mo * P robes from the f irs * P"*e set sti?l showS 

S^S"2T5-^ n 3 H pr0be from another P robe sete "Bhl up more strongly than the corresponding probe from^he 
fret probe set. this provdes a s.mple visual indication that the target sequence and reference sequence differ 

8 " ' abelled ter9et b ° Und 40 the Pr0beS in an arra > SP^if icaily^r each nucS 
w SSE^^TT? 8 T 60 ," iS Performed betWeen ^ an '"terrogation position aligned 

I L P ? Theseprobesformacolumn (actual or conceptual) on the array. For example, a columnoften contains 
one probe from each of A. C, Q and T lanes. The nudeotide in the target sequence is ideled as the com P "eme^tTf 
m^S^TZFSr*"- interTO9ati0n V**™ in P robe Rowing *e highest hybridization signa. Z a cTmn 

feasant Z T T™? * ***** te * referenC6 ^ dark «£™ * eachToZ, 

hTo a S" nf 6 H P ? ' ^ 00 Umn havh9 the hi9hest "^i^ton signal. The sequence canbe read by following 
the pattern of dark squares from left to right across the array. The first dark square is in the A lane indicating thaTtte 
nudeot.de occupying the interrogation position of the probe represented by thTs square is an A The fSSlS.^ 
Sm ^v e Z e c S8qU ^ Ce J S COmplement <* nudeotide occu Pyi"9 the interrogation position of this probe (ia, a T) 
Smilarly, the second dark square is ,n the T-lane. from which it can bededucedthatthe second nucleotide in the reference 

SSTSLSl *? * w d / rk square is in the T - |ane - from " hfch * «" be deduced •* *• «?rSSS 

Xrn^ ^k"! 6 ,S a,S ° a " A> and S ° f0rth - By inc,udin9 probes in *• first P"*» set (and by implication in the 
SsltirSr ^TZ P0S *'° nS corres P° n - din a to ™>e°tide in a reference sequent it is possiSe 

e ? ^tS' f. robeS 3 COlurnn ' ° n,y 0ne 080 exhibit a P erfect "«** to the target sequence whereas the others 

h 6aSt 3 T^ 56 miSmatCh - 7116 Pr0be a P erfect matah «3y pr^esTslb^S 

£££ 5SSf 3 J S ' 9nal ^"i" 6 ° thef thfee in *• CO,umn and is thereb * easi| y id en«ied. Howe^ert Some 

al^lTatioTs •jKS17^^?^ b ^ S PerfeCt and 3 o^base mismatch is less clear. Sus 

^2™S?^i3f na ^ * Si9nal fr ° m *" b6St * WdUn " probes t0 *• b est hybridcing 

probe that must be exceeded for a particular target position to be read from the probes. A high call ratio ensures that 

Sh 3 "' ^ Ca " in9 1aT9et nucleotides . " <*" «sult in some nucleotides being 

^ ifh" I" accurately read. A lower call ratio results in fewer ambiguous calls, but can result inmore ero^sous 
cans. It has been found that at a call ratio of 1 .2 virtually all calls are acourSe. However, a S^JSESZK 
of bases (e.o.. up to about 10%) may have to be scored as ambiguous signmcant number 

^Although small regions of the target sequence can sometimes be ambiguous, these regions usually occur at the 

" ar it Se9men ! S j" different Thus, for precharacterized mutations, it is knov^Tn^dvan^e 

whether that mutation .s likely to occur within a region of unambiguously determinable sequence 

An array of probes is most useful for analyzing the reference sequence from which the probes were desiqned and 
vanants of that sequence exhibiting substantial sequence similarity with the reference MqiLT^JSSSS 
space* over the reference sequence). When an array is used to analyze £ «,2££££^ 

Z JH, "SS ° ne , P f 6 6XhibitS 3 P6rfeCt ^ 10 r6ference «* other three prS2 

LrS^nt!! exhib-ts single-base m 1S matches. Thus, discrimination between hybridization signals is usuafy hiS 
and accurate sequence is obtained. High accuracy is also obtained when an array is used for analyzing a target seauence 
ST^ZT ^ nce h s ^ enC6 »* h - • -utation rela Je to the reference 2££ SSS 
widely spaced mutations relative to the reference sequence. At different mutant loci, one probe exhibte a perfed malch 

(wrth resped to analysis of the reference sequence) being the lane in which the perfed match occurs 
sna r^. r m r S Se ^ enC !! Sh0 r n9 3 hi9h de 9 re e of divergence from the reference strain or incorporating several closely 
SS^S^T T" Strain " 3 Si " 9 ' e 9rDUP 01 ProbeS ( '- e - des,9ned «*• res Ped to a single 
JS^ J L a/S PrOV ' de aCCUrate Sequence for the ^ variant re 9 ion °» sequence. At some particuS 

SSSL T T. drfferent d69reeS ° f miSmatCh between *» four P robes - s ^ h a comparison does no7a"aTaSw 
the ^target nucleotide corresponding to that columnar position to be cal^Drttoht^^u^«tl«S 
ta of signal from probes having interrogation positions encompassed by the deletion However signal ™ 3SS 
lost from probes having interrogation positions closely proximal to the deletion resulting in some regbns SSm££l 
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sequence that cannot be read. Target sequence bearing insertions will also exhibit short regions including and proximal 
to the insertion that usually cannot be read. 

The presence of short regions of difficult-to-read target because of closely spaced mutations, insertions or deletions, 
does not prevent determination of the remaining sequence of the target as different regions of a target sequence are 

5 determined independently. Moreover, such ambiguities as might result from analysis of diverse variants with a single 
group of probes can be avoided by including multiple groups of probe sets on a array. For example, one group of probes 
can be designed based on a full-length reference sequence, and the other groups on subsequences of the reference 
sequence incorporating frequently occurring mutations or strain variations. 

A particular advantage of the present sequencing strategy over conventional sequencing methods is the capacity 

10 simultaneously to detect and quantify proportions of multiple target sequences. Such capacity is valuable, e.g., for diag- 
nosis of patients who are heterozygous with respect to a gene or who are infected with a virus, such as HIV, which is 
usually present in several polymorphic forms. Such capacity is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple target sequences is detected from the relative signals of the 
four probes at the array columns corresponding to the target nucleotides at which diversity occurs. The relative signals 

is of the four probes for the mixture under test are compared with the corresponding signals from a homogeneous reference 
sequence. An increase in a signal from a probe that is mismatched with respect to the reference sequence, and a 
corresponding decrease in the signal from the probe which is matched with the reference sequence, signal the presence 
of a mutant strain in the mixture. The extent in shift in hybridization signals of the probes is related to the proportion of 
a target sequence in the mixture. Shifts in relative hybridization signals can be quantitatively related to proportions of 

20 reference and mutant sequence by prior calibration of the array with seeded mixtures of the mutant and reference 
sequences. By this means, a array can be used to detect variant or mutant strains constituting as little as 1 , 5, 20, or 25 
% of a mixture of stains. 

Similar principles allow the simultaneous analysis of multiple target sequences even when none is identical to the 
reference sequence. For example, with a mixture of two target sequences bearing first and second mutations, there 

25 would be a variation in the hybridization patterns of probes having interrogation positions corresponding to the first and 
second mutations relative to the hybridization pattern with the reference sequence. At each position, one of the probes 
having a mismatched interrogation position relative to the reference sequence would show an increase in hybridization 
signal, and the probe having a matched interrogation position relative to the reference sequence would show a decrease 
in hybridization signal. Analysis of the hybridization pattern of the mixture of mutant target sequences, preferably in 

30 comparison with the hybridization pattern of the reference sequence, indicates the presence of two mutant target 
sequences, the position and nature of the mutation in each strain, and the relative proportions of each strain. 

In a variation of the above method, several target sequences target sequences are differentially labelled before 
being simultaneously applied to the array. For example, each different target sequence can be labelled with a fluorescent 
labels emitting at different wavelength. After applying a mixtures of target sequence to the arrays, the individual target 

35 sequences can be distinguished and independently analyzed by virtue of the differential labels. For example, the methods 
target sequences obtained from a patient at different stages of a disease can be differently labelled and analyzed simul- 
taneously, facilitating identification of new mutations. 

2. Block Tiling 

40 

In block tiling, a perfectly matched (or wildtype) probe is compared with multiple sets of mismatched or mutant 
probes. The perfectly matched probe and the multiple sets of mismatched probes with which it is compared collectively 
form a group or block of probes on the array. Each set comprises at least one, and usually, three mismatched probes. 
FIG. 16 shows a perfectly matched probe (CAATCGA) having three interrogation positions (l v l 2 and l 3 ). The perfectly 

45 matched probe is compared with three sets of probes (arbitrarily designated A, B and C), each having three mismatched 
probes. In set A, the three mismatched probes are identical to a sequence comprising the perfectly matched probe or 
a subsequence thereof including the interrogation positions, except at the first interrogation position. That is, the mis- 
matched probes in the set A differ from the perfectly matched probe set at the first interrogation position. Thus, the 
relative hybridization signals of the perfectly matched probe and the mismatched probes in the set A indicates the identity 

so of the nucleotide in a target sequence corresponding to the first interrogation position. This nucleotide is the complement 
of the nucleotide occupying the interrogation position of the probe showing the highest signal. Similarly, set B comprises 
three mismatched probes, that differ from the perfectly matched probe at the second interrogation position. The relative 
hybridization intensities of the perfectly matched probe and the three mismatched probes of set B reveal the identity of 
the nucleotide in the target sequence corresponding to the second interrogation position (/.e., n2 in FIG. 16). Similarly. 

55 the three mismatched probes in set C in FIG. 16 differ from the perfectly matched probe at the third interrogation position. 
Comparison of the hybridization intensities of the perfectly matched probe and the mismatched probes in the set C 
reveals the identity of the nucleotide in the target sequence corresponding to the third interrogation position (n3). 

As noted above, a perfectly matched probe may have seven or more interrogation positions. If there are seven 
interrogation positions, there are seven sets of three mismatched probe, each set serving to identify the nucleotide 
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corresponding to one of the seven interrogation positions. Similarly, if there are 20 interrogation positions in the perfectly 
matched probe, then 20 sets of three mismatched probes are employed. As in other tiling strategies, selected probes 
can be omitted if it is known in advance that only certain types of mutations are likely to arise. 

Each block of probes allows short regions of a target sequence to be read. For example, for a block of probes having 
seven interrogation positions, seven nucleotides in the target sequence can be read. Of course, a array can contain any 
number of blocks depending on how many nucleotides of the target are of interest. The hybridization signals for each 
block can be analyzed independently of any other block. The block tiling strategy can also be combined with other tiling 
strategies, with different parts of the same reference sequence being tiled by different strategies. 

The block tiling strategy is a species of the basic tiling strategy discussed above, in which the probe from the first 
probe set has more than one interrogation position. The perfectly matched probe in the block tiling strategy is equivalent 
to a probe from the first probe set in the basic tiling strategy. The three mismatched probes in set A in block tiling are 
equivalent to probes from the second, third and fourth probe sets in the basic tiling strategy. The three mismatched 
probes in set B of block tiling are equivalent to probes from additional probe sets in basic tiling arbitrarily designated the 
fifth, sixth and seventh probe sets. The three mismatched probes in set C of blocking tiling are equivalent to probes from 
three further probe sets in basic tiling arbitrarily designated the eighth, ninth and tenth probe sets. 

The block tiling strategy offers two advantages over a basic strategy in which each probe in the first set has a single 
interrogation position. One advantage is that the same sequence information can be obtained from fewer probes. A 
second advantage is that each of the probes constituting a block (/. e. , a probe from the first probe set and a corresponding 
probe from each of the other probe sets) can have identical 3' and 5' sequences, with the variation confined to a central 
segment containing the interrogation positions. The identity of 3' sequence between different probes simplifies the strat- 
egy for solid phase synthesis of the probes on the array and results in more uniform deposition of the different probes 
on the array, thereby in turn increasing the uniformity of signal to noise ratio for different regions of the array. 

VC Enzymatic Discrimination Enhancement 

Unfortunately using the foregoing tiling strategies as well as other Sequencing By Hybridization techniques {e.g., 
those disclosed in co-pending Application Ser. Nos. 08/082,937 (filed June 25, 1993) and 08/168.904 (filed December 
15, 1993). each of which are incorporated herein by reference for all purposes), it is frequently difficult to discriminate 
between fully complementary hybrids and those that differ by one or more base pairs. However, it has now been deter- 
mined that sequencing by hybridization can be improved by using various enzymes that catalyze oligonucleotide cleav- 
age and ligation reactions. More particularly, discrimination between fully complementary hybrids and those that differ 
by one or more base pairs can be greatly enhanced by using various enzymes that catalyze oligonucleotide cleavage 
and ligation reactions. 

A. Enhanced Discrimination Using Nuclease Treatment 

Nuclease treatment can be used to improve the quality of hybridization signals on high density oligonucleotide 
arrays. More particularly, after the array of oligonucleotides has been combined with a labelled target nucleic acid to 
form target-oligonucleotide hybrid complexes, the target-oligonucleotide hybrid complexes are treated with a nuclease 
and, in turn, they are washed to remove non-perfectly complementary target-oligonucleotide hybrid complexes. Following 
nuclease treatment, the targetoligonucleotide hybrid complexes which are perfectly complementary are more readily 
identified. From the location of the labelled targets, the oligonucleotide probes which hybridized with the targets can be 
identified and, in turn, the sequence of the target nucleic acid can more readily be determined or verified. 

The particular nuclease used will depend on the target nucleic acid being sequenced. If the target is RNA, a RNA 
nuclease is used. Similarly, if the target is DNA, a DNA nuclease is used. RNase A is an example of an RNA nuclease 
that can be used to increase the quality of RNA hybridization signals on high density oligonucleotide arrays. RNase A 
effectively recognizes and cuts single-stranded RNA, including RNA in RNA:DNA hybrids that is not in a perfect double- 
stranded structure. Moreover, RNA bulges, loops, and even single base mismatches can be recognized and cleaved by 
RNase A. In addition. RNase A recognizes and cleaves target RNA which binds to multiple oligonucleotide probes 
present on the substrate if there are intervening single-stranded regions. S1 nuclease and Mung Bean nuclease are 
examples of DNA nucleases which can be used to improve the DNA hybridization signals on high density oligonucleotide 
arrays. Other nucleases, which will be apparent to those of skill in the art. can similarly be used to increase the quality 
of RNA hybridization signals on high density oligonucleotide arrays and. in turn, to more accurately determine the 
sequence of the target nucleic acid. 

FIG. 4 is a schematic outline of a hybridization procedure which can be carried out prior to nuclease treatment. 
Fluorescein-UTP and -CTP labelled RNA is prepared from a PCR product by in vitro transcription. The RNA is fragmented 
by heating and allowed to hybridize with an array of oligonucleotide probes on a single substrate. The array of oligonu- 
cleotide probes is generated using the tiling procedure described so that the array of oligonucleotide probes is capable 
of recognizing substantially all of the possible subsequences present in the target RNA. Moreover, for purposes of 
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comparison, the array of oligonucleotides is preferably generated so that all of the four possible probes for a given position 
to be identified are in close proximity to one another (i.e., so that they are in predefined regions which are near to one 
another). Following hybridization, the substrate is rinsed with the hybridization buffer and a quantitative fluorescence 
image of the hybridization pattern is obtained by, for example, scanning the substrate with a confocal microscope. It 

5 should be noted that confocal detection allows hybridization to be measured in the presence of excess labelled target 
and, hence, if desired, hybridization can be detected in real time. 

Following hybridization, the substrate having an array of target: oligonucleotide hybridization complexes thereon is 
contacted with a nuclease. This is most simply carried out by adding a solution of the nuclease to the surface of the 
substrate. Alternatively, however, this can be carried out by flowing a solution of the nuclease over the substrate using. 

10 for example, techniques similar to the flow channel methods described above. The nuclease solution is typically formed 
using the buffer used to carry out the hybridization reaction (/.e. ( the hybridization buffer). The concentration of the 
nuclease will vary depending on the particular nuclease used, but will typically range from about 0.05 \ig/m\ to about 2 
mg/ml. Moreover, the time in which the array of targetroligonucleotide hybridization complexes is in contact with the 
nuclease will vary. Typically, nuclease treatment is carried out for a period of time ranging from about 5 minutes to 3 

75 hours. Following treatment with the nuclease, the substrate is again washed with the hybridization buffer, and a quanti- 
tative fluorescence image of the hybridization pattern is obtained by scanning the substrate with, for example, a confocal 
microscope. 

As such, nuclease treatment can be used following hybridization to improve the quality of hybridization signals on 
high density oligonucleotide arrays and, in turn, to more accurately determine the sequence of the target nucleic acid. 

20 It will be readily apparent to the those of skill in the art that the foregoing is intended to illustrate, and not restrict, the 
way in which an array of target:oligonucleotide hybrid complexes can be treated with a nuclease to improve hybridization 
signals on high density oligonucleotide arrays. 

In another aspect, the present invention provides a method for obtaining sequencing information about an unlabeled 
target oligonucleotide, comprising: (a) contacting an unlabeled target oligonucleotide with a library of labeled oligonu- 

25 cleotide probes, each of the oligonucleotide probes having a known sequence and being attached to a solid support at 
a known position, to hybridize the target oligonucleotide to at least one member of the library of probes, thereby forming 
a hybridized library; (b) contacting the hybridized library with a nuclease capable of cleaving double-stranded oligonu- 
cleotides to release from the hybridized library a portion of the labeled oliogonucieotide probes or fragments thereof; 
and (c) identifying the positions of the hybridized library from which labeled probes or fragments thereof have been 

30 removed, to determine the sequence of the unlabeled target oligonucleotide. 

In this aspect of the invention a library of oligonucleotide probes is prepared, for example, using the VLSIPS™ 
technology describe above (See, Section III, supra). Once the Itorary of probes has been prepared, the 5* terminus of 
each probe can be labeled with a detectable label such as those described in Section V, infra. Preferably, the label is a 
fluorescent label. 

35 The library of labeled oligonucleotide probes is then contacted with an unlabeled target oligonucleotide. The unla- 
beled oligonucleotide can be synthetic can be isolated from natural sources. In preferred embodiments, the unlabeled 
oligonucleotide is genomic DNA or RNA. For example, purified DNA or a whole-cell digest which has been partially 
sequenced can be lightly fragmented (e.g., by digestion with a restriction enzyme which provides infrequent cuts and 
which infrequently cuts within any of the regions desired to be resequenced). The fragments of interest can be separated 

40 using a column containing probes complementary to a part of the sequence of interest. The complementary fragments 
are bound in the column while the remaining DNA is washed through. The fragments of interest are then removed (eg., 
by heat or by chemical means) and contacted with the library of probes. 

Once the library of probes has been contacted with the target oligonucleotide under conditions sufficient for hybrid- 
ization to occur, the resulting hybridized library is contacted with an appropriate nuclease enzyme. Alternatively, the 

45 nuclease can be introduced to the library in the same mixture as the target oligonucleotide. The nuclease can be any 
of a variety of commercially available nucleases which are capable of cleaving double-stranded DNA. Examples of such 
nucleases include DNase I. 

The hybridized library which has been contacted with the nuclease is then washed to remove the label from those 
positions wherein hybridization has taken place. By scanning the washed library with a detector to determine the pres- 

so ence or absence of labels in a region, hybridization information can be obtained. This method is applicable to resequenc- 
ing tilings (see, Section IV, supra), mutation detection and other combinatorial methods. Other advantages exist to the 
present method, including (i) the use of unlabeled target oligonucleotide, which simplifies target preparation and allows 
genomic material to be used directly, (ii) the use of a variety of nucleases which can be selected for cleaving the target 
and probe, the probe alone, or probe-probe interaction, and (iii) application using existing VLSIPS technology. 

55 The foregoing enzymatic discrimination enhancement methods can be used in all instances where improved dis- 
crimination between fully complementary hybrids and those that differ by one or more base pairs would be helpful. More 
particularly, such methods can be used to more accurately determine the sequence (e.g.* die novo sequencing), or 
monitor mutations, or resequence the target nucleic acid (/. e., such methods can be used in conjunction with a second 
sequencing procedure to provide independent verification). 
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B. Enhanced Discrimination Using Ligation Reactions 

Ligation reactions can be used to discriminate between fully complementary hybrids and those that differ by one or 
more base pairs. More particularly, an array of oligonucleotides is generated on a substrate (in the 3* to 5* direction) 

5 using any one of the methods described above. The oligonucleotides in the array are generally shorter in length than 
the target nucleic acid so that when hybridized to the target nucleic acid, the target nucleic acid generally has a 3' 
overhang. In this embodiment, the target nucleic acid is not necessarily labelled. After the array of oligonucleotides has 
been combined with the target nucleic acid to form target-oligonucleotide hybrid complexes, the target-oligonucleotide 
hybrid complexes are contacted with a ligase and a labelled, ligatable probe or, alternatively, with a pool of labelled, 

io ligatable probes. The ligation reaction of the labelled, ligatable probes to the 5' end of the oligonucleotide probes on the 
substrate will occur, in the presence of the ligase, predominantly when the targetoiigonucleotide hybrid has formed with 
correct base-pairing near the 5' end of the oligonucleotide probe and where there is a suitable 3' overhang of the target 
nucleic acid to serve as a template for hybridization and ligation. After the ligation reaction, the substrate is washed 
(multiple times if necessary) with water at a temperature of about 40°C to 50°C to remove the target nucleic acid and 

15 the labelled, unligated probes. Thereafter, a quantitative fluorescence image of the hybridization pattern is obtained by 
scanning the substrate with, for example, a confbcal microscope, and labelled oligonucleotide probes, i.e. t the oligonu- 
cleotide probes which are perfectly complementary to the target nucleic acid, are identified. Using this information, 
sequence information about the target nucleic acid can be determined. 

Any enzyme that catalyzes the formation of a phosphodiester bond at the site of a single-stranded break in duplex 

20 DNA can be used to enhance discrimination between fully complementary hybrids and those that differ by one or more 
base pairs. Such ligases include, but are not limited to, T4 DNA ligase, ligases isolated from E. colt and ligases isolated 
from other bacteriophages. The concentration of the ligase will vary depending on the particular ligase used, the con- 
centration of target and buffer conditions, but will typically range from about 500 units/ml to about 5,000 units/ml. More- 
over, the time in which the array of target:oligonucleotide hybridization complexes is in contact with the ligase will vary. 

25 Typically, the ligase treatment is carried out for a period of time ranging from minutes to hundreds of hours. 

In a further embodiment, the present invention provides another method which can be used to improve discrimination 
of base-pair mismatches near the 5* end of the immobilized probes. More particularly, the present invention provides a 
method for sequencing an unlabeled target oligonucleotide, the method comprising: (a) combining; (i) a substrate com- 
prising an array of positionally distinguishable oligonucleotide probes each of which has a constant region and a variable 

30 region, the variable region capable of binding to a defined subsequence of preselected length; (ii) a constant oligonu- 
cleotide having a sequence which is complementary to the constant region of the oligonucleotide probes; (iii) a target 
oligonucleotide whose sequence is to be determined; and (iv) a ligase, thereby forming target oligonucleotide-oligonu- 
cleotide probe hybrid complexes of complementary subsequences of known sequence; (b) contacting the target oligo- 
nucleotide-oligonucleotide probe hybrid complexes with a ligase and a pool of labelled, ligatable oligonucleotide probes 

35 of a preselected length, the pool of labelled, ligatable oligonucleotide probes representing all possible sequences of the 
preselected length; (c) removing unbound target nucleic acid and labelled, unligated oligonucleotide probes; and (d) 
determining which of the oligonucleotide probes contain the labelled, ligatable oligonucleotide probe as an indication of 
a subsequence which is perfectly complementary to a subsequence of the target oligonucleotide. See, FIG. 8, which 
illustrates this method. 

40 in this method, the constant region is typically from about 10 to about 14 nucleotides in length, whereas the variable 
region is typically from about 6 to about 8 nucleotides in length. The labelled, ligatable oligonucleotide probes have a 
preselected length, and the pool of such probes represents all possible sequences of the preselected length. Thus, If 
the probe is 6 nucleotides in length, all possible 6-mers are present in the pool. As with the previously described method, 
any enzyme that catalyzes the formation of a phosphodiester bond at the site of a single-strand break in duplex DNA 

45 can be used to enhance discrimination between fully complementary hybrids and those that differ by one or more base 
pairs. Such ligases include, but are not limited to, T4 DNA ligase, ligases isolated from E. coli and ligases isolated from 
other bacteriophages. The concentration of the ligase will vary depending on the particular ligase used, the concentration 
of target and buffer conditions, but will typically range from about 500 units/ml to about 5,000 units/ml. Moreover, the 
time in which the array of target oligonucleotide:oligonudeotide probe hybrid complexes is in contact with the ligase will 

so vary. Typically, the ligase treatment is carried out for a period of time ranging from from minutes to hundreds of hours. 
In addition, it will be readily apparent to those of skill that the two ligation reactions can either be done sequentially or, 
alternatively, simultaneously in a single reaction mix that contains: target oligonucleotides; constant oligonucleotides; a 
pool of labeled, ligatable probes; and a ligase. 

In the above method, the first ligation reaction will occur only if the 5' end of the target oligonucleotide (/le., the last 

55 3-4 bases) matches the variable region of the oligonucleotide probe. Similarly, the second ligation reaction, which adds 
a label to the probe, will occur efficiently only if the first ligation reaction was successful and if the ligated target is 
complementary to the 5* end of the probe. Thus, this method provides for specificity at both ends of the variable region. 
Moreover, this method is advantageous in that it allows a shorter variable probe region to be used; increases probertarget 
specificity and removes the necessity of labeling the target. 
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As such, ligation reactions can effectively be used to improve discrimination of base-pair mismatches near the 5' 
end of the immobilized probe, mismatches that are often poorly discriminated following hybridization alone. The foregoing 
enhancement discrimination methods involving the use of ligation reactions can be used in all instances where improved 
discrimination between fully complementary hybrids and those that differ by one or more base pairs would be helpful. 

5 More particularly, such methods can be used to more accurately determine the sequence (e.g., de novo sequencing), 
or monitor mutations, or resequence the target nucleic acid (/.e.. such methods can be used in conjunction with a second 
sequencing procedure to provide independent verification). It will be readily apparent to those of skill in the art that the 
foregoing is intended to illustrate, and not restrict, the way in which an array of targetoligonucleotide hybrid complexes 
can be treated with a ligase and a pool of labelled, ligatable probes to improve hybridization signals on high density 

70 oligonucleotide arrays. 

VI. Detection Methods 

Methods for detection depend upon the label selected. The criteria for selecting an appropriate label are discussed 
is below, however, a fluorescent label is preferred because of its extreme sensitivity and simplicity. Standard labeling pro- 
cedures are used to determine the positions where interactions between a target sequence and a reagent take place. 
For example, if a target sequence is labeled and exposed to a matrix of different oligonucleotide probes, only those 
locations where the oligonucleotides interact with the target will exhibit any signal; In addition to using a label, other 
methods may be used to scan the matrix to determine where interaction takes place. The spectrum of interactions can, 
20 of course, be determined in a temporal manner by repeated scans of interactions which occur at each of a multiplicity 
of conditions. However, instead of testing each individual interaction separately, a multiplicity of sequence interactions 
may be simultaneously determined on a matrix. 

A. Labeling Techniques 

25 

The target nucleic acid can be labeled using any of a number of convenient detectable markers. A fluorescent label 
is preferred because H provides a very strong signal with low background. It is also optically detectable at high resolution 
and sensitivity through a quick scanning procedure. Other potential labeling moieties include, radioisotope, chemilumi- 
nescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers, magnetic labels, and linked 
30 enzymes. 

In another embodiment, different targets can be simultaneously sequenced where each target has a different label. 

For instance, one target could have a green fluorescent label and a second target could have a red fluorescent label. 

The scanning step will distinguish cites of binding of the red label from those binding the green fluorescent label. Each 

sequence can be analyzed independently from one another. 
35 Suitable chromogens which can be employed include those molecules and compounds which adsorb light in a 

distinctive range of wavelengths so that a color can be observed or, alternatively, which emit light when irradiated with 

radiation of a particular wave length or wave length range, e.g., fluorescers. 

A wide variety of suitable dyes are available, being primary chosen to provide an intense color with minimal absorp- 
tion by their surroundings. Illustrative dye types include quinoline dyes, triarylmethane dyes, acridine dyes, alizarine 
40 dyes, phthaleins, insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and phenazoxonium 

dyes. 

A wide variety of fluorescers can be employed either by alone or, alternatively, in conjunction with quencher mole- 
cules. Fluorescers of interest fall into a variety of categories having certain primary functionalities. These primary func- 
tionalities include 1- and 2-aminonaphthalene, p.p'-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9- 

45 aminoacridines, p.p'-diaminobenzophenone imines. anthracenes, oxacarbocyanine, marocyanine, 3-aminoequilenin, 
perylene. bisbenzoxazole, bis-p-oxazolyl benzene, 1.2-benzophenazin, retinol, bis-3-aminopyridinium salts, helle- 
brigenin, tetracycline, sterophenol, benzimidzaolylphenylamine, 2-oxo-3-chromen, indole, xanthen. 7-hydroxycoumarin, 
phenoxazine, salicylate, strophanthidin, porphyrins, triarylmethanes and flavin. Individual fluorescent compounds which 
have functionalities for linking or which can be modified to incorporate such functionalities include, e.g., dansyl chloride; 

so fluoresceins such as 3,6-dihydroxy-9-phenylxanthhydrol; rhodamineisothiocyanate; N-phenyl 1 -amino-8-sutfonatonaph- 
thalene; N-phenyl 2-amino-6-sulfonatonaphthalene: A-acetamido-^-isothiocyanato-stilbene^^'-disulfonic acid; pyrene- 

3- sulfonic acid; 2-toluidinonaphthalene-6-sulfonate; N-phenyl. N-methyl 2-aminoaphthalene-6-sutfonate; ethidium bro- 
mide; stebrine; auromine-0.2-(9-anthroyl)palmitate; dansyl phosphatidylethanolamine; N.N'-dioctadecyl oxacaibocya- 
nine; N.N'-dihexyl oxacarbocyanine; merocyanine. 4(3'pyrenyl)butyrate; d-3-aminodesoxy-equilenin; 12- 

ss (9'anthroyl)stearate; 2-methyianthracene; 9-vinylanthracene; 2.2'(vinylene-p-phenylene)bisbenzoxazole; p-bis[2-(4- 
methyl-5-phenyl-oxazolyl)]benzene; 6-dimethyiamino-1,2-benzophenazin; retinol; bis(3'-aminopyridinium) 1,10-decan- 
diyl diiodide; sulfonaphthylhydrazone of hellibrienin; chlorotetracycline; N(7-dimethylamino-4-methyl-2-oxo-3-chrome- 
nyl)mateimide;N-[p-(2-benzimidazo 

4- chloro-7-nitro-2,1,3benzooxadiazole; merocyanine 540; resorufin; rose bengal; and 2.4-diphenyl-3(2H)-furanone. 
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Desirably, f luorescers should absorb light above about 300 nm, preferably about 350 nm, and more preferably above 
about 400 nm, usually emitting at wavelengths greater than about 1 0 nm higher than the wavelength of the light absorbed. 
It should be noted that the absorption and emission characteristics of the bound dye can differ from the unbound dye. 
Therefore, when referring to the various wavelength ranges and characteristics of the dyes, it is intended to indicate the 
dyes as employed and not the dye which is unconjugated and characterized in an arbitrary solvent. 

Fluoresces are generally preferred because by irradiating a f luorescer with light, one can obtain a plurality of emis- 
sions. Thus, a single label can provide for a plurality of measurable events. 

Detectable signal can also be provided by chemiluminescent and bioluminescent sources. Chemiluminescent 
sources include a compound which becomes electronically excited by a chemical reaction and can then emit light which 
serves as the detectible signal or donates energy to a fluorescent acceptor. A diverse number of families of compounds 
have been found to provide chemiluminescence under a variety or conditions. One family of compounds is 2,3-dihydro- 
1 ,4-phthalazinedione. The must popular compound is luminol, which is the 5-amino compound. Other members of the 
family include the 5-amino-6,7 f 8-trimethoxy- and the dimethylamino[ca]benz analog. These compounds can be made 
to luminesce with alkaline hydrogen peroxide or calcium hypochlorite and base. Another family of compounds is the 
2,4,5-triphenylimidazoles, with lophine as the common name for the parent product. Chemiluminescent analogs include 
para-dimethylamino and -methoxy substituents. Chemiluminescence can also be obtained with oxalates, usually oxalyl 
active esters, e.g., p-nitrophenyl and a peroxide, e.g., hydrogen peroxide, under basic conditions. Alternatively, luciferins 
can be used in conjunction with luciferase or lucigenins to provide bioluminescence. 

Spin labels are provided by reporter molecules with an unpaired electron spin which can be detected by electron 
spin resonance (ESR) spectroscopy. Exemplary spin labels include organic free radicals', transitional metal complexes, 
particularly vanadium, copper, iron, and manganese, and the like. Exemplary spin labels include nitroxide free radicals. 

R Scanning System 

25 With the automated detection apparatus, the correlation of specific positional labeling is converted to the presence 
on the target of sequences for which the oligonucelotides have specificity of interaction. Thus, the positional information 
is directly converted to a database indicating what sequence interactions have occurred. For example, in a nucleic acid 
hybridization application, the sequences which have interacted between the substrate matrix and the target molecule 
can be directly listed from the positional information. The detection system used is described in PCT publication no 
WO90/1 5070; and U. S.S.N. 07/624, 1 20. Although the detection described therein is a fluorescence detector, the detector 
can be replaced by a spectroscopic or other detector. The scanning system can make use of a moving detector relative 
to a fixed substrate, a fixed detector with a moving substrate, or a combination. Alternatively. miiTors or other apparatus 
can be used to transfer the signal directly to the detector. See, e.g., U.S. S.N. 07/624,120, which is hereby incorporated 
herein by reference. 

The detection method will typically also incorporate some signal processing to determine whether the signal at a 
particular matrix position is a true positive or may be a spurious signal. For example, a signal from a region which has 
actual positive signal may tend to spread over and provide a positive signal in an adjacent region which actually should 
not have one. This may occur, e.g., where the scanning system is not properly discriminating with sufficiently high res- 
olution in its pixel density to separate the two regions. Thus, the signal over the spatial region may be evaluated pixel 
by pixel to determine the locations and the actual extent of positive signal. A true positive signal should, in theory, show 
a uniform signal at each pixel location. Thus, processing by plotting number of pixels with actual signal intensity should 
have a clearly uniform signal intensity. Regions where the signal intensities show a fairly wide dispersion, may be par- 
ticularly suspect and the scanning system may be programmed to more carefully scan those positions. 

More sophisticated signal processing techniques can be applied to the initial determination of whether a positive 
45 signal exists or not. See, e.g., U.S.S.N. 07/624,120. 

From a listing of those sequences which interact, data analysis may be performed on a series of sequences, for 
example, in a nucleic acid sequence application, each of the sequences may be analyzed for their overlap regions and 
the original target sequence may be reconstructed from the collection of specific subsequences obtained therein. Other 
sorts of analyses for different applications may also be performed, and because the scanning system directly interfaces 
so with a computer the information need not be transferred manually. This provides for the ability to handle large amounts 
of data with very little human intervention. This, of course, provides significant advantages over manual manipulations. 
Increased throughput and reproducibility is thereby provided by the automation of vast majority of steps in any of these 
applications. 

55 R Data Analysis 

Data analysis will differ depending upon whether sequencing de novo or resequencing is being done, but will typically 
involve aligning the proper sequences with their overlaps to determine the target sequence or a mutation in the target 
sequence. Although the target "sequence" may not specifically correspond to any specific molecule, especially where 
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the target sequence is broken and fragmented up in the sequencing process, the sequence corresponds to a contiguous 
sequence of the subfragments. 

The data analysis can be performed manually or. preferably, by a computer using a appropriate program. Although 
the specific manipulations necessary to reassemble the target sequence from fragments may take many forms, one 

5 embodiment uses a sorting program to sort all of the subsequences using a defined hierarchy. The hierarchy need not 
necessarily correspond to any physical hierarchy, but provides a means to determine, in order, which subfragments have 
actually been found in the target sequence. In this manner, overlaps can be checked and found directly rather than 
having to search throughout the entire set after each selection process. For example, where the oligonucleotide probes 
are 10-mers the first 9 positions can be sorted. A particular subsequence can be selected as in the examples, to deter- 

w mine where the process starts. As analogous to the theoretical example provided above, the sorting procedure provides 
the ability to immediately find the position of the subsequence which contains the first 9 positions and can compare 
whether there exists more than 1 subsequence during the first 9 positions. In fact, the computer can easily generate all 
of the possible target sequences which contain given combinations of subsequences. Typically, there will be only one, 
but in various situations, there will be more. 

is Generally such computer programs provide for automated scanning of the substrate to determine the positions of 
oligonucleotide and target interaction. Simple processing of the intensity of the signal may be incorporated to fitter out 
clearly spurious signals. The positions with positive interaction are correlated with the sequence specificity of specific 
matrix positions, to generate the set of matching subsequences. This information is further correlated with other target 
sequence information, e.g., restriction fragment analysis. The sequences are then aligned using overlap data, thereby 

so leading to possible corresponding target sequences which will, optimally, correspond to a single target sequence 

VII. Applications 

The enzymatic discrimination enhancement methods provided by the present invention have very broad applications. 
25 Although described specifically for polynucleotide sequences, similar sequencing, fingerprinting, mapping, and screen- 
ing procedures may be applied to polypeptide, carbohydrate, or other polymers. Such methods can be used in all 
instances where improved discrimination between fully complementary hybrids and those that differ by one or more 
base pairs would be helpful. More particularly, such methods can be used with de novo sequencing, or in conjunction 
with a second sequencing procedure to provide independent verification [i.e.. resequencing). See, e.g.. Science 
30 242:1 245 (1 988). For example, a large polynucleotide sequence defined by either the Maxam and Gilbert technque or 
by the Sanger technique may be verified by using the present invention. 

In addition by selection of appropriate probes, a polynucleotide sequence can be fingerprinted. Fingerprinting is a 
less detailed sequence analysis which usually involves the characterization or a sequence by a combination of defined 
features Sequence fingerprinting is particularly useful because the repertoire of possible features which can be tested 
35 is virtually infinite. Moreover, the stringency of matching is also variable depending upon the application. A Southern 
Blot analysis may be characterized as a means of simple fingerprint analysis. 

Fingerprinting analysis may be performed to the resolution of specific nucleotides, or may be used to determine 
homologies, most commonly for large segments. In particular, an array of oligonucleotide probes of virtually any workable 
size may be positionally localized on a matrix and used to probe a sequence for either absolute complementary matching, 
ao or homology to the desired level of stringency using selected hybridization conditions. 

In addition, the present invention provides means for mapping analysis of a target sequence or sequences. Mapping 
will usually involve the sequential ordering or a plurality of various sequences, or may involve the localization of a par- 
ticular sequence within a plurality of sequences. This may be achieved by immobilizing particular large segments onto 
the matrix and probing with a shorter sequence todetermine which of the large sequences contain that smaller sequence. 
45 Alternatively, relatively shorter probes of known or random sequence may be immobilized to the matrix and a map of 
various different target sequences may be determined from overlaps. Principles of such an approach are descnbed in 
some detail by Evans et al. (1989) "Physical Mapping of Complex Genomes by Cosmid Multiplex Analysis," Proc. Natl. 
Acad Sci USA 86-5030-5034; Michiels, et al.. "Molecular Approaches to Genome Analysis: A Strategy for the Con- 
struction of Ordered Overlap Clone Libraries." CABIOS 3:203-210 (1987); Olsen. et al. "Random-Clone Strategy for 
so Genomic Restriction Mapping in Yeast." Proc. Natl. Acad. Sci. USA 83:7826-7830 (1986); Craig, er al., "Ordering of 
Cosmid Clones Covering the Herpes Simplex Virus Type I (HSV-I) Genome: A Test Case for Fingerprinting by Hybridi- 
zation " Nuc Acids Res. 18:2653-2660 (1990); and Coulson, et al.. "Toward a Physical Map of the Genome of the 
Nematode Caenorhabditis elegans." Proc. Natl. Acad. Sci. USA 83:7821-7825 (1986); each of which is hereby incor- 
porated herein by reference. . . 
55 Fingerprinting analysis also provides a means of identification. In addition to its value in apprehension of criminals 
from whom a biological sample, e.g., blood, has been collected, fingerprinting can ensure personal identification for 
other reasons. For example, it may be useful for identification of bodies in tragedies such as fire, flood, and vehicle 
crashes In other cases the identification may be useful in identification of persons suffering from amnesia, or of missing 
persons. Other forensics applications include establishing the identity of a person, e.g., military identification "dog tags". 
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or may be used in identifying the source of particular biological samples. Fingerprinting technology is described. e.g., 
in Carrano, ef at, "A High- Resolution, Fluorescence-Based, Semi-automated method for DNA Fingerprinting," Genomics 
4: 1 20- 1 36 (1 989), which is hereby incorporated herein by reference. 

The fingerprinting analysis may be used to perform various types of genetic screening. For example, a single sub- 
strate may be generated with a plurality of screening probes, allowing for the simultaneous genetic screening for a large 
number of genetic markers. Thus, prenatal or diagnostic screening can be simplified, economized, and made more 
generally accessible. 

In addition to the sequencing, fingerprinting, and mapping applications, the present invention also provide, means 
for determining specificity of interaction with particular sequences. Many of these applications are described in U S S N 
07/362,901 (VLSIPS parent), U.S.S.N. 07/492,462 (VLSIPS CIP), U.S.S.N. 07/435,316 (caged biotin parent), and 
U.S.S.N. 07/612,671 (caged biotin CIP), which are incorporated herein by reference. 

Vlll. Libraries of Unlmolecular, Double-Stranded Oligonucleotides 

In one aspect, the present invention provides libraries of unimolecular double-stranded oligonucleotides, each mem- 
ber of the library having the formula: 

Y — L 1 — X 1 — L 2 — X 2 

in which Y represents a solid support. X 1 and X 2 represent a pair of complementary oligonucleotides, L 1 represents a 
bond or a spacer, and L 2 represents a linking group having sufficient length such that X 1 and X 2 form a double-stranded 
oligonucleotide. 

The solid support may be biological, nonbiological, organic, inorganic, or a combination of any of these, existing as 
particles, strands, precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates slides 
etc. The solid support is preferably flat but may take on alternative surface configurations. For example, the solid support 
may contain raised or depressed regions on which synthesis takes place. In some embodiments, the solid support will 
be chosen to provide appropriate light-absorbing characteristics. For example, the support may be a polymerized Lang- 
muir Blodgett film, functionalized glass, Si, Ge. GaAs. GaP, Si0 2 . SiN 4 , modified silicon, or any one of a variety of gels 
or polymers such as (poly)tetrafluoroethylene, (poly)vinylidendifluoride f polystyrene, polycarbonate, or combinations 
thereof. Other suitable solid support materials will be readily apparent to those of skill in the art. Preferably, the surface 
of the solid support will contain reactive groups, which could be carboxyl, amino, hydroxyl, thiol, or the like More pref- 
erably, the surface will be optically transparent and will have surface Si-OH functionalities, such as are found on silica 
surfaces. 

Attached to the solid support is an optional spacer, L 1 . The spacer molecules are preferably of sufficient length to 
permit the double-stranded oligonucleotides in the completed member of the Itorary to interact freely with molecules 
exposed to the library. The spacer molecules, when present, are typically 6-50 atoms long to provide sufficient exposure 
for the attached doubie-stranded DNA molecule. The spacer, L 1 , is comprised of a surface attaching portion and a longer 
chain portion. The surface attaching portion is that part of L 1 which is directly attached to the solid support This portion 
can be attached to the solid support via carbon-carbon bonds using, for example, supports having (poly)trffluorochlo- 
roethylene surfaces, or preferably, by siloxane bonds (using, for example, glass or silicon oxide as the solid support) 
Siloxane bonds with the surface of the support are formed in one embodiment via reactions of surface attaching portions 
bearing tnchlorosilyl or trialkoxysilyl groups. The surface attaching groups will also have a site for attachment of the 
longer chain portion. For example, groups which are suitable for attachment to a longer chain portion would include 
amines, hydroxyl. thiol, and carboxyl. Preferred surface attaching portions include aminoalkylsilanes and hydroxyalkyl- 
s.lanes. In particularly preferred embodiments, the surface attaching portion of L 1 is either bis(2-hydroxyethyI)amino- 
propyltnethoxysilane. 2-hydroxyethylaminopropyltriethoxysilane, aminopropyltriethoxysilane or 

hydroxypropyltriethoxysilane. 

The longer chain portion can be any of a variety of molecules which are inert to the subsequent conditions for 
polymer synthesis. These longer chain portions will typically be aryl acetylene, ethylene glycol oligomers containing 2- 
14 monomer units, diamines, diacids, amino acids, peptides, or combinations thereof. In some embodiments, the longer 
chain portion is a polynucleotide. The longer chain portion which is to be used as part of L 1 can be selected based upon 
its hydrophilic/hydrophobic properties to improve presentation of the double-stranded oligonucleotides to certain recep- 
tors, proteins or drugs. The longer chain portion of L 1 can be constructed of polyethyieneglycols. polynucleotides 
alkylene, polyalcohol. polyester, polyamine. polyphosphodiester and combinations thereof. Additionally for use in syn- 
thesis of the libraries of the invention, L 1 will typically have a protecting group, attached to a functional group (ie 
hydroxyl, amino or carboxyiic acid) on the distal or terminal end of the chain portion (opposite the solid support) After 
deprotection and coupling, the distal end is covalently bound to an oligomer. 

Attached to the distal end of L 1 is an oligonucleotide. X 1 , which is a single-stranded DNA or RNA molecule The 
oligonucleotides which are part of the present invention are typically of from about 4 to about 1 00 nucleotides in length 
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Preferably, X 1 is an oligonucleotide which is about 6 to about 30 nucleotides in length. The oligonucleotide is typically 
linked to L 1 via the 3*-hydroxyl group of the oligonucleotide and a functional group on L 1 which results in the formation 
of an ether, ester, carbamate or phosphate ester linkage. 

Attached to the distal end of X 1 is a linking group, L 2 , which is flexible and of sufficient length that X 1 can effectively 

5 hybridize with X 2 . The length of the linker will typically be a length which is at least the length spanned by two nucleotide , 
monomers, and preferably at least four nucleotide monomers, while not be so long as to interfere with either the pairing 
of X 1 and X 2 or any subsequent assays. The linking group itself will typically be an alkylene group (of from about 6 to 
about 24 carbons in length), a polyethyleneglycol group (of from about 2 to about 24 ethyleneglycol monomers in a linear 
configuration), a polyalcohol group, a polyamine group (e.g, spermine, spermidine and polymeric derivatives thereof), 

io a polyester group (e.g., poly(ethyi acrylate) having of from 3 to 15 ethyl acrylate monomers in a linear configuration), a 
polyphosphodiester group, or a polynucleotide (having from about 2 to about 12 nucleic acids). Preferably, the linking 
group will be a polyethyleneglycol group which is at least a tetra ethyleneglycol, and more preferably, from about 1 to 4 
hexaethyleneglycols linked in a linear array. For use in synthesis of the compounds of the invention, the linking group 
will be provided with functional groups which can be suitably protected or activated. The linking group will be covalently 

is attached to each of the complementary oligonucleotides, X 1 and X 2 , by means of an ether, ester, carbamate, phosphate 
ester or amine linkage. The flexible linking group L 2 will be attached to the S'-hydroxyl of the terminal monomer of X 1 
and to the 3'-hydroxyl of the initial monomer of X 2 . Preferred linkages are phosphate ester linkages which can be formed 
in the same manner as the oligonucleotide linkages which are present in X 1 and X 2 . For example, hexaethyleneglycol 
can be protected on one terminus with a photolabile protecting group [i.e., NVOC or MeNPOC) and activated on the 

20 other terminus with 2-cyanoethyl-N,N-diisopropylamino-chlorophosphite to form a phosphoramidite. This linking group 
can then be used for construction of the libraries in the same manner as the photolabile-protected, phosphoramidite- 
activated nucleotides. Alternatively, ester linkages to X 1 and X 2 can be formed when the L 2 has terminal carboxylic acid 
moieties (using the S'-hydroxyl of X 1 and the 3 f -hydroxyt of X 2 ). Other methods of forming ether, carbamate or amine 
linkages are known to those of skill in the art and particular reagents and references can be found in such texts as March, 

25 Advanced Organic Chemistry, 4th Ed., Wiley-lnterscience. New York, NY, 1992, incorporated herein by reference. 

The oligonucleotide, X 2 , which is covalently attached to the distal end of the linking group is, like X 1 , a single-stranded 
DN A or RN A molecule. The oligonucleotides which are part of the present invention are typically of from about 4 to about 
100 nucleotides in length. Preferably, X 2 is an oligonucleotide which is about 6 to about 30 nucleotides in length and 
exhibits complementarity to X 1 of from 90 to 100%. More preferably, X 1 and X 2 are 100% complementary, in one group 

30 of embodiments, either X 1 or X 2 will further comprise a bulge or loop portion and exhibit complementarity of from 90 to 
1 00% over the remainder of the oligonucleotide. 

In a particularly preferred embodiment, the solid support is a silica support, the spacer is a polyethyleneglycol con- 
jugated to an aminoalkylsilane, the linking group is a polyethyleneglycol group, and X 1 and X 2 are complementary oli- 
gonucleotides each comprising of from 6 to 30 nucleic acid monomers. 

35 The library can have virtually any number of different members, and will be limited only by the number or variety of 
compounds desired to he screened in a given application and by the synthetic capabilities of the practitioner. In one 
group of embodiments, the library will have from 2 up to 100 members. In other groups of embodiments, the library will 
have between 100 and 10,000 members, and between 10,000 and 1,000,000 members, preferably on a solid support. 
In preferred embodiments, the library will have a density of more than 100 members at known locations per cm 2 , pref- 

40 erably more than 1 ,000 per cm 2 , more preferably more than 1 0,000 per cm 2 . 

Preparation of these libraries can typically be carried out using any of the methods described above for the prepa- 
ration of oligonucleotides on a solid support (e.g., light-directed methods, flow channel or spotting methods). 

1X1 Libraries of Conformational!/ Restricted Probes 



In still another aspect, the present invention provides libraries of conformationally-restricted probes. Each of the 
members of the library comprises a solid support having an optional spacer which is attached to an oligomer of the 
formula: 



in which X 11 and X 12 are complementary oligonucleotides and Z is a probe. The probe will have sufficient length such 
that X 11 and X 12 form a double-stranded DNA portion of each member. X 11 andX 12 are as described above for X 1 and 
X 2 respectively, except that for the present aspect of the invention, each member of the probe library can have the same 
55 X 1 1 and the same X 12 . and differ only in the probe portion. In one group of embodiments, X 1 1 and X 12 are either a poly- 
A oligonucleotide or a poly-T oligonucleotide. 

As noted above, each member of the library will typically have a different probe portion. The probes, Z, can be any 
of a variety of structures for which receptor-probe binding information is sought for conformationally-restricted forms. 
For example, the probe can be a agonist or antagonist for a cell membrane receptor, a toxin, venom, viral epitope, 
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hormone, peptide, enzyme, cofactor, drug, protein or antibody. In one group of embodiments, the probes are different 
peptides, each having of from about 4 to about 12 amino acids. Preferably the probes will be linked via polyphosphate 
diesters, although other linkages are also suitable. For example, the last monomer employed on the X 11 chain can be 
a 5*-aminopropyl-functionalized phosphoramidite nucleotide (available from Glen Research. Sterling, Virginia, USA or 

5 Genosys Biotechnologies, The Woodlands, Texas, USA) which will provide a synthesis initiation site for the carboxy to 
amino synthesis of the peptide probe. Once the peptide probe is formed, a 3'-succinylated nucleoside (from Cruachem, 
Sterling, Virginia, USA) will be added under peptide coupling conditions. In yet another group of embodiments, the 
probes will be oligonucleotides of from 4 to about 30 nucleic acid monomers which will form a DNA or RNA hairpin 
structure. For use in synthesis, the probes can also have associated functional groups (i.e., hydroxy), amino, carboxylic 

10 acid, anhydride and derivatives thereof) for attaching two positions on the probe to each of the complementary oligonu- 
cleotides. 

The surface of the solid support is preferably provided with a spacer molecule, although it will be understood that 
the spacer molecules are not elements of this aspect of the invention. Where present, the spacer molecules will be as 
described above for L 1 . 

is The libraries of conformational^ restricted probes can also have virtually any number of members. As above, the 
number of members will be limited only by design of the particular screening assay for which the library will be used, 
and by the synthetic capabilities of the practitioner. In one group of embodiments, the library will have from 2 to 100 
members. In other groups of embodiments, the library will have between 1 00 and 1 0,000 members, and between 1 0.000 
and 1,000,000 members. Also as above, in preferred embodiments, the library will have a density of more than 100 

20 members at known locations per cm 2 , preferably more than 1000 per cm 2 , more preferably more than 10,000 per cm 2 . 
Preparation of these libraries can typically be carried out using any of the methods described above for the prepa- 
ration of oligonucleotides on a solid support (e.g., light-directed methods, flow channel or spotting methods). 

X. Libraries of Intermolecular, Doubly-Anchored, Double-Stranded Oligonucleotides 

25 

In another aspect, the present invention provides libraries of intermolecular, doubly-anchored, double-stranded oli- 
gonucleotides, each member of the library having the formula: 
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In this formula, Y represents a solid support, X 1 and X 2 represent a pair of complementary oligonucleotides, and 
40 L 1 and L 2 each represent a bond or a spacer. Typically, L 1 and L 2 are the same and are spacers having sufficient length 
such that X 1 and X 2 can form a double-stranded oligonucleotide. The non-covalent binding which exists between X 1 
and X 2 is represented by the dashed line. 

The solid support can be any of the solid supports described herein for other aspects of the invention. Attached to 
the solid support are spacers, L 1 and L 2 . These spacers are the same as those described above for the unimolecular, 
45 double-stranded oligonucleotide embodiments. Preferably, the spacers are comprised of a surface attaching portion, 
which is a hydroxyalkyltriethoxysilane or an aminoalkyltriethoxysilane, and a longer chain portion which is derived from 
a poly( ethylene glycol). 

Attached to the distal ends of L 1 and L 2 are X 1 and X 2 , respectively. X 1 and X 2 are each a single-stranded DNA or 
RNA molecule. The oligonucleotides which are part of the present invention are typically of from about 4 to about 100 
so nucleotides in length. Preferably, X 1 and X 2 are each an oligonucleotide of about 6 to about 30 nucleotides in length. 
The oligonucleotides are typically linked to L 1 or L 2 via the 3'-hydroxyl group of the oligonucleotide and a functional group 
on L 1 which results in the formation of an ether, ester, carbamate or phosphate ester linkage. 

in one group of preferred embodiments, X 1 and X 2 are complementary oligonucleotides of about 6 to about 30 
nucleotides in length, and exhibit complementarity of from 90 to 100% over their entire length. Arrays, or libraries of 
55 these double-stranded oligonucleotides can be used to screen samples of DNA, RNA, proteins or drugs for their 
sequence-specific interactions. 

In another group of preferred embodiments, the S'-terminal region of X 1 (the distal portion with reference to the solid 
support) will be complementary to the S'-terminal region of X 2 (the distal portion, again with reference to the solid support). 
For example, X 1 and X 2 can each be an oligonucleotide of from about 10 to about 30 nucleotides in length. The 5* end 
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of X 1 will comprise of from about 4 to about 20 nucleotides which will be complementary to the 5' end of X 2 (see FIG. 
10E). As above, the degree of complementarity will typically be from about 90 to about 100%, preferably about 100%. 
Arrays, or libraries of this group of embodiments can be used for the hybridization and ligation of additional oligonucle- 
otide. With reference to FIGS. 1 0E and 10F, libraries of oligonucleotides which are complementary in overlapping regions 

5 of their 5* ends can be prepared (see FIG. 1 0E). then incubated with additional oligonucleotides which are complementary 
to the 3' ends of the surface-bound oligonucleotides. After hybridization, a continuous helix is formed with a length 
equivalent to the combination of the hybridized added oligonucleotides and the complementary portion of the surface- 
bound oligonucleotides. Additionally, each strand will contain a nick between the added oligonucleotide and the surface- 
bound oligonucleotide. In preferred embodiments, the surface-bound oligonucleotides are phosphorylated (chemically 

10 or enzymically with a kinase) such that the nick can be closed with a T4 DNA ligase to produce a contiguous intermo- 
lecular, doubly-anchored, double-stranded oligonucleotide which is longer than either of the initially formed X 1 or X 2 
oligonucleotides. 

Another application for this aspect of the invention is hybridization enhancement. This is illustrated in FIG. 1 0G. As 
can be seen in FIG. 10G, a library of intermolecular, doubly-anchored, double-stranded oligonucleotides is prepared as 

75 described above and as illustrated in FIG. 10E. Target oligonucleotides, having unknown sequences at their 3' termini 
incubated with the library. Hybridization of the 3* end of the target oligonucleotide to the complementary portion of a 
library member is enhanced by the cooperative nature of formation of the extended DNA duplex. Additionally, the hybrid- 
ization step can be followed by a ligation step (when the ends of the surface-bound oligonucleotides are phosphorylated) 
to further enhance the discrimination of any 3' mismatches. 

20 The libraries of this aspect of the invention can also have virtually any number of different members, and will be 
limited only by the number or variety of compounds desired to he screened in a given application and by the synthetic 
capabilities of the practitioner. In one group of embodiments, the library will have from 2 up to 100 members. In other 
groups of embodiments, the library will have between 100 and 10,000 members, and between 10,000 and 1 ,000,000 
members, preferably on a solid support. In preferred embodiments, the library will have a density of more than 100 

25 members at known locations per cm 2 , preferably more than 1 ,000 per cm 2 , more preferably more than 1 0,000 per cm 2 . 
Preparation of these libraries can typically be carried out using any of the methods described above for the prepa- 
ration of oligonucleotides on a solid support (e.g.. light-directed methods, flow channel or spotting methods). Typically, 
the oligonucleotides X 1 and X 2 will be synthesized as a pair in each cell of the library. Such synthesis generally requires 
that synthesis initiation sites be prepared having two different and independently removable protecting groups. For exam- 

30 pie, a solid support {e.g., a glass coverslip) can be modified with a suitable linking group (e.g., hydroxypropyltriethoxy si- 
lane, or the mono triethoxysilylpropyl ether of a polyethylene glycol having an appropriate length). The surface hydroxyl 
groups which are present following the attachment of the linking groups can he uniformly protected with MeNPOC-CI. 
Controlled irradiation can he used to deprotect about half of the hydroxyl groups, which are subsequently protected as 
DMTor MMT(mono-methoxy trityl) ethers. In this manner, each cell or portion of the solid support will have approximately 

35 equivalent numbers of two linking groups being independently removable protecting groups. Synthesis of the library can 
then proceed in a straightforward manner by removing the MeNPOC groups (by irradiation) in one cell and constructing 
oligonucleotide X 1 . then removing the DMT or MMT group in the same cell and constructing oligonucleotide X 2 . Synthesis 
in each of the cells or regions can proceed in a similar manner to produce the libraries of this aspect of the invention. 
In this manner, using two rounds of synthesis following the initial steps to divide the available sites into independently 

40 protected sites, it is possible to prepare arrays, or libraries of regions containing pair of complementary oligonucleotides 
of any sequence. 

X/. Methods of Screening Libraries of Double-Stranded Oligonucleotides and Probes 

45 A library prepared according to any of the methods described above can be used to screen for receptors having 
high affinity for unimoiecular, double-stranded oligonucleotides, intermolecular, doubly-anchored, double-stranded oli- 
gonucleotides or conformationally restricted probes. In one group of embodiments, a solution containing a marked 
(labelled) receptor is introduced to the library and incubated for a suitable period of time. The library is then washed free 
of unbound receptor and the probes or double-stranded oligonucleotides having high affinity for the receptor are identified 

so by identifying those regions on the surface of the library where markers are located. Suitable markers include, but are 
not limited to, radiolabels, chromophores, f iuorophores, chemiluminescent moieties and transition metals. Alternatively, 
the presence of receptors may be detected using a variety of other techniques, such as an assay with a labelled enzyme, 
antibody, and the like. Other techniques using various marker systems for detecting bound receptor will be readily appar- 
ent to those skilled in the art. 

55 In a preferred embodiment, a library prepared on a single solid support (using, for example, the VLSIPS™ technique) 
can be exposed to a solution containing marked receptor such as a marked antibody. The receptor can be marked in 
any of a variety of ways, but in one embodiment marking is effected with a radioactive label. The marked antibody binds 
with high affinity to an immobilized antigen previously localized on the surface. After washing the surface free of unbound 
receptor, the surface is placed proximate to x-ray film or phosphorimagers to identify the antigens that are recognized 
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by the antibody. Alternatively, a fluorescent marker may be provided and detection may be by way of a charge-coupled 
device (CCD), fluorescence microscopy or laser scanning. 

When autoradiography is the detection method used, the marker is a radioactive label, such as 32 P. The marker on 
the surface is exposed to X-ray film or a phosphorimager, which is developed and read out on a scanner. An exposure 

5 time of about 1 hour is typical in one embodiment Fluorescence detection using a f luorophore label, such as fluorescein, 
attached to the receptor will usually require shorter exposure times. 

Quantitative assays for receptor concentrations can also be performed according to the present invention. In a direct 
assay method, the surface containing localized probes prepared as described above, is incubated with a solution con- 
taining a marked receptor for a suitable period of time. The surface is then washed free of unbound receptor. The amount 

10 of marker present at predefined regions of the surface is then measured and can be related to the amount of receptor 
in solution. Methods and conditions for performing such assays are well-known and are presented in, for example, L 
Hood et a/., Immunology, Benjamin/Cummings (1978), and E. Harlow et a/., Antibodies, A Laboratory Manual, Cold 
Spring Harbor Laboratory, (1988). See also, U.S. Pat. No. 4,376,1 10 for methods of performing sandwich assays. The 
precise conditions for performing these steps will be apparent to one skilled in the art. 

15 A competitive assay method for two receptors can also be employed using the present invention. Methods of con- 
ducting competitive assays are known to those of skill in the art One such method involves immobilizing conformationaliy 
restricted probes on predefined regions of a surface as described above. An unmarked first receptor is then bound to 
the probes on the surface having a known specific binding affinity for the receptors. A solution containing a marked 
second receptor is then introduced to the surface and incubated for a suitable time. The surface is then washed free of 

so unbound reagents and the amount of marker remaining on the surface is measured. In another form of competition 
assay, marked and unmarked receptors can be exposed to the surface simultaneously. The amount of marker remaining 
on predefined regions of the surface can be related to the amount of unknown receptor in solution. Yet another form of 
competition assay will utilize two receptors having different labels, for example, two different chromophores. 

In other embodiments, in order to detect receptor binding, the double-stranded oligonucleotides which are formed 

25 with attached probes or with a flexible linking group will be treated with an intercalating dye, preferably a fluorescent 
dye. The library can be scanned to establish a background fluorescence. After exposure of the library to a receptor 
solution, the exposed library will be scanned or illuminated and examined for those areas in which fluorescence has 
changed. Alternatively, the receptor of interest can be labeled with a fluorescent dye by methods known to those of skill 
in the art and incubated with the library of probes. The library can then be scanned or illuminated, as above, and examined 

30 for areas of fluorescence. 

In instances where the libraries are synthesized on beads in a number of containers, the beads are exposed to a 
receptor of interest. In a preferred embodiment the receptor is f luorescently or radioactively labelled. Thereafter, one or 
more beads are identified that exhibit significant levels of, for example, fluorescence using one of a variety of techniques. 
For example, in one embodiment, mechanical separation under a microscope is utilized. The identity of the molecule 

35 on the surface of such separated beads is then identified using, for example, NMR, mass spectrometry, PGR amplification 
and sequencing of the associated DNA, or the like. In another embodiment, automated sorting (/.&, fluorescence acti- 
vated cell sorting) can be used to separate beads (bearing probes) which bind to receptors from those which do not 
bind. Typically the beads will be labeled and identified by methods disclosed in Needels, et a/., Proc. Natl. Acad ScL, 
USA 90:10700-10704 (1993), incorporated herein by reference. 

40 The assay methods described above for the libraries of the present invention will have tremendous application in 
such endeavors as DNA "footprinting" of proteins which bind DNA Currently, DNAfootprinting is conducted using DNase 
I digestion of double-stranded DNA in the presence of a putative DNA binding protein. Gel analysis of cut and protected 
DNA fragments then provides a "footprint- of where the protein contacts the DNA. This method is both labor and time 
intensive. See, Galas etal. t Nucleic Acid Res. 5:3157 (1978). Using the above methods, a "footprint" could be produced 

45 using a single array of unimolecular, double-stranded oligonucleotides in a fraction of the time of conventional methods. 
Typically the protein will be labeled with a radioactive or fluorescent species and jncubated with a library of unimolecular, 
doublei-stranded DNA. Phosphor imaging or fluorescence detection will provide a footprint of those regions on the library 
where the protein has bound. Alternatively, unlabeled protein can be used. When unlabeled protein is used, the double- 
stranded oligonucleotides in the library will all be labeled with a marker, typically a fluorescent marker. Incorporation of 

so a marker into each member of the library can be carried out by terminating the oligonucleotide synthesis with a com- 
mercially available fluorescing phosphoramidite nucleotide derivative. Following incubation with the unlabeled protein, 
the library will be treated with DNase I and examined for areas which are protected from cleavage. 

The assay methods described above for the libraries of the present invention can also be used in reverse drug 
discovery. In such an application, a compound having known pharmacological safety or other desired properties {e.g.. 

55 aspirin) could be screened against a variety of double-stranded oligonucleotides for potential binding. If the compound 
is shown to bind to a sequence associated with, for example, tumor suppression, the compound can be further examined 
for efficacy in the related diseases. 

In other embodiments, probe arrays comprising p-turn mimetics can be prepared and assayed for activity against 
a particular receptor, p-turn mimetics are compounds having molecular structures similar to p-turns which are one of 
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a reSptor wWch wifcorrespond to the affinity exhibited by the p-turn and ts receptor. 
XII. Bloelectronic Devices and Methods 

,„ aspect the prcse n, 

dioonucleotkle h,bridizat,on. * oeneral "^^^^^^"Ve patent), incorporated herein 

08/082.93^ ^ June » ^>^^^^^^ — ^ to — * ** 

"^nH^^^ 

DMA. orfirst oligonucleotide, is proved with an '^^^^^^J^ of the array. After 
probes, each of which bears an electron-acceptor tag and ^ p '^ hvSSSd aSv is illuminated to induce an electron 
Uidizaticndthefirstoligonucleotidetothearrayhaso^ 

transfer reaction in the direction of the surface of the array. The in an array will have 

on the surface where hybridization has taken place. Typxa.hr. each of he ^ ^ , n 

an attached electron-acceptor tag located near the ^2^^^^^/^^), the electron- 
embodiments in which the arrays are prepared by W*^™*** ^ Cached ei^er to the 3' monomer by 
acceptor tag will be located near the 9 position. The <^ £S2£«£ " monomer and the 

methods known to those of skill in the art. or rt can be attached to a ^^^^ to the ^ and 
solid support. Such a spacing group will have * d '^^^ digonucSde will 

the c4igonucleotide. a third functona. group^ 

typicallyhavetheelec^ tag can be added in 

™n^^ * — « 

(phen*) 2 (dppz) complex. marlost whirh with the electron-donor tag, will participate in an electron 

t ^e=^^^^ 

iS B *• eiectron-donor tag is a Senium (I.) (phen^PP*) comp.ex and the 

electron-acceptor tag is a rhodium (III) ( P H) 2 (phenT c ™P lex . bioelectronic detection of sequence-specific 

in still another aspect, the present .nventon P™*£ ^J^SjJStoTte which an array of o.igonu- 
oligonucleotide hybridization. The dev.ce w... y^^^^^^^tm sur face of the sensor and have 
cleotides are attached. The oligonudeofdes wdl be f a *?"?^**^° wiI1 be a tag whic h is capable of 
an electron-acceptor tag attached to each l*?^ 

b£r^^^^ 
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XilL Alternative Embodiments 
25 A. Adhesives 
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fi. Methods For Preparing Single-Stranded Nucleic Acid Sequences 

a method of directing the synthesis of a sinale-strand^ » JS^vT P articular| y- th e Present invention provides 
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cleofides that would allow a large number of different nuriac ^JJTJ number a ^ of 

chip can have virtually any number of In one group of 

more than 10,000 per cm 2 . . . - mu tat©d- Oj oligonucleotides. 

In addition to In, taegon*.s^ 
H th. «MX h a. an internal position ot O, . W a.™ "^^^'.S^^The <*p can oonsis. 

templates for all possible, correctly ordered junctions. evnthesized on a chip and selectively released into 

in another embodiment the oligonucleotides. L *£^g£^£ 17C) Any gene or mutant gene can 
solution. This embodiment can be earned out using a photo-labile ^ ^ of ligat j on reactions, 
be synthesized by selectively releasing the desired ohgonuclecrtdes , nto soUrt «^ l ^3^lJi w determined 
This would provide an incredibly diverse mutant-generator, capacity. ^^^2Sd!?£f *e chip). A mutant 
by the irradiation steps used to release the specie set of oligc* (and ' «^^tJ^ Jioli^^« 
iuence or. alternatively, a family of ft. photolabile linker 

produce the desired reactant oligos. In this embodiment J"""™ 1 ne j? £ Moreover> ^ photo | ysis wa ve- 

^2^^^^^^^^ — indud * " " " 

limited to, ortho-nitrobenzyl groups and derivatives thereof. 
40 XIV Examples 

The following exarrples are provided to illustrate the efficacy of the inventions herein. 
A. ENHANCED DISCRIMINATION USING RNase A 
' ™^ te i,,ust,at«^^ 

hybridization signals on high density oligonucleotide arrays. 
EXAMPLE I 

The highdens rt y array of oligonucleotide probeso 

standard VLS.PS protocol set forth set of DMA 15- 

tiling strategy described shown m F.g. 5. Brief ly. the c hip usee """"TjP^ of a , 3 to region spa nning the D- 
mers covalently linked to a glass surface. A set of 1ou 'J»*^^^^ 
.oopregionofhumanmitochondri^ 

B^^efaTe SSZ!!22££ ™DNA targeUence. one of fte four probes « be 
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the extent of tiybr4aS ate toSSS ta^S ' ' ncor,>0 "*'<l «"°»»<WM into the target DNA or RNA. 

tott«ta'9etseo^ence.thenBiiliicorredcal|-fer^i.(^^f. t 1 Bln ^ r ^ OTffi «*™a>"iPle™ntary 
1 -3 kb RNA „ me rn^ocr^naTXT^eSS^^ " ,eS " ta " '"*'* to < i < m * ■ 

contbcal lluorescence microscope r££££ SsI *£?T ■»"■"■ <*»"ted using a seeming 

ten seated wm, 75 ^ ot 0.2 IT^JSiT^T ?.* CMp 6 

interva , the chin is rinsed with fiYQQpc T ^ «, <. intervals of 10, 45. and 75 minutes. After each 

to determine the numb^ o7ccSbas £ 7<£ SSKST" ?"? * ^ (S * ^ * ^ r6SUttS are ana * zed 
from the RNase A treatmerte ' mult,p,e - base am b'9«'»es and miscalls, and the improvement resulting 

there"^ 

background. (These n^^S^SSSSS » Li ' the Si9 " al W3S n0t than **» *• 

ization time and ,enperatur,s^ 

mentation and labellinq The conditions nc^ h ar * i *u , * CTAB ' and the extent of R NA frag- 

to decree the r^7,££^£*Z S^S^JT^T"^ 0 " RNA - x °°™ *»<^ 

and 350 out of 458 ambiguities were correctly r^nh/ori TtJr* l!" , ' Crf218 miscalls were corrected, 

were resolved incorrectly and thT^ 46 ***** " ere initia,| y which 

A treatment. After th. wJhS^ ^T^fiS?? 31 Chan9ed t0 i " COrrect ca,,s after RNase 
hybridization results are c^!iS^.^^ ^TiSSS^ "TT? H ° W " h8n the 
are called correctly. These results clean* rf^nc* ♦ T . trea,men t approximately 87% of the 1302 bases 

•eqoenc. tnto^cT^l^SZ^OrlS £T ^ * "^*» «• «-* - •» 
fl. ENHANCED DISCRIMINATION USING LIGATION REACTIONS 

near^ei^ 

oligonucleotide probes on a wZJSZZJZ fiSlSS* ' oli 9° nucleotide s to the 5' end of 

has formed with correct basM^^J^J!^ J V**™ Wher9Ver 8 P"*e*«B« hybrid 

to serve as a temp.at Tf^TdSn an^a^n * 8 SUitab,e 3 ' overhan ° «* * e *0« 

discrimination of bLe^rmis™^„«S^/2f T" 9 ^ 1,16 ' igati0n r6aCti0n is used to im P™« 
following hybridization atonT ^°' the ^ 
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p ic i n ^! S t^'V ChiP iS made With probes havin 9 the fo, '^n9 sequence- P-P-A-A-CGCGCrr,rwr r. „ • 
P is a polyethlyeneg ycol (PEG) soacer A r anH Qr « ■ M ^^uCaOCGCNC-5 wherein: 
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following sequence (listed 5' to 3 1 ): 
has the following sequence: 

olwaBon-CfitKhydrooenbonds. „^ is treated with T4 Polynucleotide Kinase in order to phosphorylate the 5" 
in 1 ml at 37°C for 90 minutes. hybridization buffer because EDTA could interfere 

^tr-creS^ 

and 4000 un« s o. T< DNA LJ-. (N-r ^^J* ^'^S. chip is'vigonxi* west-, 
plus 150 n*l NaCI. The reaaon is allowed to proceed tor i« inours ar ^ . alter washing is that 

SsiE=25S35==== 
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N 


HYB 


HDF 


LIG 


LDF 


A 


143 


1.1 


15 


5.5 


C 


134 


1.1 


13 


6.3 


G" 


i* 


1.0 


82 


1.0 


T 


110 


1.4 


20 


4.1 
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inthe above ta«e. N isth. bass In the pr*etha. ^^^^^^S^^SSSSi 

ha.e,Gis.l»cor^.m,n,a,,.> a ».HYB^ 

^tss^Se^^^ 

'^sUatatterh^*,^^^ 

probe and the probes containing a mismatch discrimination is greatly improved, with 
maximum difference is only 40%. .n <™*^^^ can h. performed on cov- 

the minimum discrimination factor greater than * These. data "™^^his ^ ic for base-paired 

single base mismatches. 
FY AMPLE II 

In this example, a chip was made with probes having the following sequences: 
P-P-A-A-CGCGCATTCN-5* (denoted CG) 

US singtbase mismatch sequences for the following ,22-mer target ohgos (hsted 5 to 3 ). 
F1-GCGCGTAAQQCCTTCGACGTAG (denoted OH1) 
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tively) has the following sequence: 
F1-CGAAGG (denoted L6B) 



specified. In partial 2X0 unte of tTSXSS^^^ m *' th ° S6 US6d h Examp,e 1 un,ess 
6-merislOnTrattS 

pairs^^ 

OH2 and the AT probefor S, %S^™££ £ 5?S h* *! ™ in the between 

oligo to. a greater extent than will OH2 und'pr ISS? ^ d ' 28 to ,ts perfectly complimentary probe 

the case inthe hyb^l^^ COnditions - ln fact ' this * be 

P^ng effects o^ 

is larger than in the AT regls^^ in » e °° «*« (OH, hybrids) 

region is quite strong^ ^ signa. in the AtUoT^^ 



N 


(OH1) 


(OH2) 




HYB 


HDF 


HYB* 


HDF* 


A 


196 


2.4 


6 


5.6 


C" 


474 


1.0 


22 


1.0 


G 


159 


3.0 


20 


1.7 


T 


103 


4.6 


5 


6.6 



* These values are somewhat uncer- 
tain because the signal is not large rel- 
ative to the background. 



* -cS^ * — e the target molecules. A .igatic 

DNA Ligase. The reaction is aHcled to prcce* fofa ^^ila^ S^J^V^V" 9 2000 ^ * T4 
stage, the chip is read and the data receded and analyzed 24 h ° UrS at 8 ° a At each 
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N 


34 hrs.. T = 22°C 


24 hrs., T = 8°C 




(01 


^) 


(OH2) 


(OH1) 


(OH2) 




LIG 


LDF 


LIG 


LDF 


LIG 


LDF 


LIG 


LDF 


A 


18 


56 


3 


31 


27 


46 


10 


88 


C 


1003 


1.0 


92 


1.0 


1234 


1.0 


879 


1.0 


G 


13 


44 


23 


13 


24 


51 


30 


29 


T 


15 


67 


3 


31 


22 


56 


8 


110 
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It is striking that after the ligation reaction at 8"C. the signals tor OH1 and oh? k„ „ . , , 
times less than the factor of 14 that was observed wul.-nno«!! , Jl *Z er by only a factor of 1 * ten 

composition dependence is Tby viShe^ ^Z laS^T'^ " * ^ ™ re *« •» 

for either OH1 or OH2. 9 ° n at ,OW tem P eratur e with no loss of discrimination 
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Example III 

In order for the ligation strategy to be useful for unknown or more complex DNA targets, it is necessary to use a 

S each i5 is mixed to make a solution that contains all 4096 labelled 6-mer ohgonucleobdes. 
A chip is made containing 10-mer probes having the following sequences 

whereT^a^^^^ me d* contains 10-mers with al« possible (4096) six base xombna^r* 

JindThe £ ohosphate group on the probes required for ligation is added chemically (usmg 5 Phosphate-ON. 
Xo*^^^^ Z as thetest step in the synthesis of the d* prior to deprotecfon o, the bases. 

The target oligo is a 22-mer having the following sequence (listed 5' to 3*): 

F1 "^^^^^P^^^^^^^^^qiI^ j n 6XSSP-Tat 22°Cfor30 minutes. The chip is read and analyzed. 
Thechi P v«s.n.tall y hybnd.z^w.th10nMOH1 "S*^™™™^ second nighe st hybridization signal. 

^prSeT^ 

The chip is washed with water to remove the hybridized target. h «hririi«d with 1 0 nM 

1 2 Thus w7h a corrplex chip containing 4096 probes with all possible 6-mer sequences a the 5 end. and I iwng apool 
of aCss^ e «g?Se 6-mers. the ligation reaction is still specific for the perfectly corrplementery probe and affords 
35 consiSle increases in the discrimination between perfect matches and angle-base mismatches. 

EXAMPLE IV 

in this example, a chip was made using the tiling strategy (A. C. G. T -containing £ 
? ?- £2£ eS 1 r 3 Tand 5) The chip is synthesized using standard VLSIPS protocols. Prior to hybrd.zat.on 

^fcLwasinrtia^^^^^ 

t^^^T^ a^ou. 45-C bio .ninu.es. leaving only *. labelled 6-nners M ha»e been ugMd » 
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in one of the four regions is greater than that in the other three related regions by at least a factor of 1 2 if non* of th* 

Following hybridization, the 11-mer probes with substitution positions-1 -2 -3 and-AallaavedqromvHho^on* 
f^ 1 ™*P'ebasea.^^^ 

ErS?Tu "^"f""? 00 P ositions " 2 ^ " 5 8«« « correct ca..s and 2 miscalls^^^ 
correct calls and 1 arrfc.gu.ty and 1 miscall, and substitution position -1 and -4 bo* yielded »^mS cafls wS m 
amb.gurt.es or rnjscalls. These results indicate that the ligation reaction with the full pool of fr*£Z S? usJd to 
' specially label hybrids between relatively complex targets and arrays of olgcmiaolE^ 

in n 7r 9 10 n0te ,h3t 1,16 pattern * "' 9ation or ^aker signals, better or worse discrimination) is not 

l°nZ t , 6 t "T 85 the P™"" 01 h *> ridiza «°"- ™* suggests that these two mZSZZL SSStSiS 
SEE? ° ^ UenCe informat0n WW1 arrayS 01 *0«mI«*. probeTFor exa^ prtteslhS 
IXSze weTtoT^t' *? P T' y disCTiminated ™* be b <*« *«« using a Bg-o^S^SSS 
ml ^1 T ■ P-rticular complementary target (leading to a signal that is too small relative to the backaZndi 

d^S^ 

C. PREPARATION OF UNIMOLECULAR, DOUBLE-STRANDED OLIGONUCLEOTIDES 
EXAMPLE I 

hydroxy Upon completion of the inner strand, another MeNPoc-protecied P^M^^SS^JZSJ^ ll I 
the synthesis of the second strand proceeded in the normal fashion. Following the synthesi i c^JtaD^tS 

PEG linker r^r ™^' G " Q " F where S te tne solid surfe ce having silyl groups P is a 



EXAMPLE U 



fluorescent tag Thus, the action of the enzyme provided a functional test for DNA structure and alsT^ !^ to hJT 
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the chio was detected using confocal fluorescence microscopy. The action of various enzymes was determined by mon- 

of fluorescence from the molecules on the chip surface (e.g. "reading" the chp) upon 
treatment with enzymes that can cut the DNA and release the fluorescent tag at the 5 end. 

The three different enzymes used to characterize the structure of the molecules on the chip were: 

1) Mung Bean Nuclease - sequence independent, single-strand specific DNA endonudease; 

2) DNase I - sequence independent, double-strand specific endonudease; 

3) EcoR1 - restriction endonudease that recognizes the sequence {5'-3) 

GAATTC in double stranded DNA. and cuts between the G and thefirst A. Mung Bean Nuclease and EcoR1 were 
obtained from New England Biolabs. and DNase I was obtained from Boehringer Mannheim. All enzymes were used at 
a Slri 20C f units per mL in the buffer recommended by the manufacturer. The enzymat.creact.ons were 
performed in a 1 ml. flow cell at 22»C. and were typically allowed to proceed for 90 minutes. 

Upon treatment of the chip with the enzyme EcoR1 . the fluorescence signal in the DS EcoR1 region and the 3 SS 
regions wi^eEcoR! half-site on the outer strand was reduced by about 10% of its initial value Th,s reduction was 
a^eastStimesgreaterthantortheotherregionsofthechip. indicating that the adion of 
on chip It was not possible to determine if the factor is greater than 5 in these prel.rn.nary expenments because of 
uncertain^ inthe constancy of the fluorescence background. However, becausethe purpose of these early expenmente 
was to Z rne^etherlimolecular double-stranded structures could be formed and whether they ooul £mjjj* 
ically recognized by proteins (and npt to provide a quantitative measure of enzyme speotaty). qualitative differences 
between the different synthesis regions were sufficient. ■ ^ w ^ oi)Kar lh _ f tho 

The reduction in signal in the 3 SS regions with the EcoR1 half-srte on the ****** ^tZaZrtl^ 
enzyme cuts single-stranded DNA with a particular sequence, or that these molecules formed a **£«^J*^ 
turethat was recognized by the enzyme. The molecules on the chip surface were at a relatively h.gh density, wrth an 
ZZ™^WoJ™xeW 1« angstroms. Thus, it was possible for the outer strand of °™"^»J™- 
double-stTanded strucTure with the outer strand of a neighboring molecule. In the ease of the SS reg^ns wrth the 
EcoR1 half-srte on the outer strand, such a bimolecular double-stranded reg.on would have the correct sequence ami 
SucLre to be recognized by EcoR1 . However, it would differ from the unimolecular double-stranded molecules in that 
£££ Sra^i regains singVstranded and thus amenable to deavage by a single-strand specrf ,c JJ* 
as Mung Bean Nuclease. Therefore. H was possible to distinguish unimolecular from bimolecular double-stranded DNA 
molecules on the surface by their ability to be cut by single and double-strand specific endonucleases. 

7*6* te T remove all molecules that have single-stranded structures and to identify r ummo ecular daibM 
molecules, the chip was first exhaustively treated with Mung Bean Nuclease. The reduction in , th. fl uores - V* 
was greater by about a factor of 2 for the SS regions of the chip, including those wrth the EcoR1 half-srte on the outer 
S Sat w?re cleaved by EcoR! . than for the 4 DS regions. Following Mung Bean 

treated with either DNase I (which cuts all remaining double-stranded molecules) or EcoRI (which should cut only the 

in the 4 DS regions was reduced by at .east 5-fo.d more than the signal in the SS regions Upon EcoRI M the 
signal in the single DS region with the corrert EcoRI sequence was reduced by at least a factor of 3 mo e thanthe 
s anal in any other region on the chip. Taken together, these results indicated that the surface-bound modules i synthe- 

tW were distant to a single-strand specific endonudease and were recognized by both a double-strand specfc endo- 
nudease. and a sequence-specHic restriction enzyme. 

45 EXAMPLE III 

This example illustrates the strategy employed for the preparation of a conformationally restricted hexapept.de . 
A glass Zers.ip having aminopropylsilane spacer groups can be further denvafcz^ Ion , th e amino S^ups wrth a 
ooly-A oligonucleotide comprising nine adenosine monomers using VLSIPS™ ("light-d.reded") methods. The tenth ade- 
nine mcTmer to be add^wil. be a S'-aminopropy.-functionalized phosphoramidite * (available from 
Genosys Biotechnologies). To the amine terminus is then added, in stepwise fashion, the hexapept.de, ^ ^^J"^^?!"" 
ning wJh the carboxyl end of the peptide (i.e.. as T-V-V-K-F-Q-R). A 3'-succinylated nucleos.de can then be added under 
pS coupling conditions and the nudeotide syrthesis of the pdy-Ttai. can be continued to prov.de a conformat.onal.y 

^Tfetobe Understood that the above description is intended to be illustrative and not restrictive Many embodiments 
will be apparent to those of skill in the art upon reviewing the above description. The scope of the Hwenton should 
therefore be determined not with reference to the above description, but should instead be determined with reference 
to the appended claims, along with the full scope of equivalents to which such claims are entitled. 
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numS SI ♦ l Z e Xl ° n S° V l eS 9reatly impr0ved methods and aPP^tus for the study of nucleotide sequences and 
5 and not restri^ve Many embodiments and variations of the invention will become apparent to those of skill in the art 

^ST^ "? ^ ref6renCe 10 the *°» descri P tion ' but instead should be determined S 
reference to the appended claims along with the full scope of equivalents to which such claims are entitled 



SEQUENCE LISTING 



is (1) GENERAL INFORMATION: 



(i) APPLICANT: 

(A) NAME: Affymax Technologies, N.V. 

(B) STREET: De Reyderkade 62 
20 (C) CITY: Curacao 

(D) STATE: 

(E) COUNTRY : Netherlands Antilles 

(F) POSTAL CODE (ZIP) : 

(G) TELEPHONE: 

(H) TELEFAX: 
25 (I) TELEX: 

(ii) TITLE OF INVENTION: Methods of Enzymatic Discrimination 
Enhancement and Surface-Bound Double-Stranded DNA 

30 (iii) NUMBER OF SEQUENCES : 42 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Hepworth Lawrence Bryer & Bizley 

(B) STREET: Merlin House, Falconry Court, Baker's Lane, 

(C) TOWN: Epping 

(D) COUNTY: Essex 

(E) COUNTRY: UK 

(F) POST CODE: CM16 5DQ 
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(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

45 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: EP 95 307501.7 

(B) FILING DATE: 20-OCT-1995 

(C) CLASSIFICATION: 
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Cvii, "i^^gS^iKSi, US 08,3*7,522 

(B) FILING DATE: 21-OCT-1994 

(VU) P XAPPSSS N ^: US 08/327,687 
(B) FILING DATE: 24-OCT-1994 

fvii) PRIOR APPLICATION DATA: 

(Vll) j^pp^c^xjoM NUMBER: US 08/533 , 582 

(B) FILING DATE: 18-OCT-1995 

fviii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Richard Edward Bizley 

(B) REFERENCE/ DOCKET NUMBER: APEP95996 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: +44 1992 561756 

(B) TELEFAX : +44 1992 561934 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



12 

AGCCTAGCTG AA 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

12 

TTCAGCTAGG CT 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 base pairs 
so (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

55 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
5 TTTTTAAAAA 

10 

(2) INFORMATION FOR SEQ ID NO: 4: 

(1) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
AAAAATTTTT 

10 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
AAAGAAAAAA GACAGTACTA AATGGA 



(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGTACTGTNT TTTTT 

45 15 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

15 

TAGTACTGNC TTTTT 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TTACTACTNG CTTTT 15 

20 {2 ) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) ' STRANDEDNESS: single 
25 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



15 



30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CTGTATCCGA CATCTGGTTA A 21 

(2) INFORMATION FOR SEQ ID NO: 10: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CCAACCAAAC CCC 

45 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 base pairs 
so (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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TO 



15 



20 



30 



35 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCAACCAAAM NMM 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
ACTGTTAGCT AATTGG 

(2) INFORMATION FOR SEQ ID NO: 13: 



13 



16 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 
25 ( c ) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGGGGGAGCT AACGGG 

16 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
TACTGTATTT TTT 

45 13 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 base pairs 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

55 
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10 



15 



fxi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

1 13 
TACTGTCTTT TTT 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

13 

TACTGTGTTT TTT 

20 (2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
30 13 
TACTGTTTTT TTT 

(2) INFORMATION FOR SEQ ID NO: 18: 

35 M) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

13 

GTACTGACTT TTT 



45 



SO 



55 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



45 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
5 GTACTGCCTT TTT 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION : SEQ ID NO: 20: 
GTACTGGCTT TTT 

13 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GTACTGTCTT TTT 

13 



(2) INFORMATION FOR SEQ ID NO: 22: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
AGTACTATCT TTT 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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TO 



IS 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

13 

AGTACTCTCT TTT 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

13 

AGTACTGTCT TTT 

20 (2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
30 13 
AGTACTTTCT TTT 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

11 

GGGNCCCTTA A 



35 



40 



45 



(2) INFORMATION FOR SEQ ID NO: 27: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

so ( C ) strandedness: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



55 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
5 TAAAGTAAGA CATAAC 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GGCTGACGTC AGCAAT 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



30 



35 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TTGCTGACAT CAGCC 

15 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TTGCTGACCT CAGCC 

45 15 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
50 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

55 
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10 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

15 

TTGCTGACGT CAGCC 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

15 

TTGCTGACTT CAGCC 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
25 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 
30 (A) NAME /KEY : modif ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod_base= OTHER , 
1 ' /note= M N - adenxne covalently modified 

at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 



20 



35 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CNCGCCGCGC AN 12 



(2) INFORMATION FOR SEQ ID NO: 34 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

45 (C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



SO (ix) FEATURE: 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: 1 



(D) OTHER INFORMATION: /raod_base= OTHER 

/note= H N - guanine covalently modif xea 
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at the 5' hydroxys grau£ witlt a 
fluorescein molecule 1 * 

5 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
NCGCGGCGCG AACGCAACGC 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
20 (A) NAME /KEY: modif ied_base 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mod_base« OTHER 

/note- W N - adenine covalently modified 
at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 

25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
NCTTACGCGC AN v 



30 (2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
35 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
40 (A) NAME/KEY: modif ied_base , 

(B) LOCATION : 12 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note* 11 N « adenine covalently modified 
at the 3' hydroxyl group with 2 
polyethylene glycol (PEG) spacers" 

45 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
NCTTAATATA AN 12 



50 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

55 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

5 (ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 ______ 

(D) OTHER INFORMATION: fig^fj^^ covalently modified 

at: the 5' hydroxy 1 group with a 
fluorescein molecule" 

15 (x i) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

22 

NCGCGTAAGG CCTTCGACGT AG 
(2) INFORMATION FOR SEQ ID NO: 38: 



20 



25 



30 



40 



45 



50 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE : ^ . 

(A) NAME /KEY : modif ied_base 

(B) LOCATION: 1 t ____ 

(D) OTHER INFOBMATION: /*£> a *«%°^ ine cova l e nt:ly edified 

at the 5' hydroxy 1 group with a 
fluorescein molecule" 



35 SEQUENCE DESCRIPTION : SEQ ID NO:38: 

NATATTAAGG CCTTCGACGT AG 



22 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 10 

(D\ OTHER INFORMATION: /mod__base= OTHER ^4*4^ 
{U) UintK AJ1 /note= "N - cytosine covalently modified 

at the 3' hydroxy 1 group with 2 
polyethylene glycol (PEG) spacers" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
5 NNNNNNGCGN 10 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 
w (A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



15 



25 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base= OTHER 
20 /note= M N « cytosine covalently modified 

at the 3' hydroxy 1 group with 2 
polyethylene glycol (PEG) spacers" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CCTTACGCGN 10 



30 



35 



40 



50 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

Arg Gin Phe Lys Val Val Thr 
1 5 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
45 (B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Thr Val Val Lys Phe Gin Arg 
1 5 
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Claims 

1 . A method for sequencing a target nucleic acid, said method comprising: 
5 (a) combining: 

(i) a substrate comprising an array of chemically synthesized and positional^ distinguishable oligonucle- 
otides each of which is complementary to a defined subsequence of preselected length; and 

(ii) a target nucleic acid; thereby forming target-oligonucleotide hybrid complexes of complementary sub- 
70 sequences of known sequence; 

(b) contacting said target-oligonucleotide hybrid complexes with a nuclease; thereby removing target-oligonu- 
cleotide complexes which are not perfectly complementary; and 

(c) determining which of said oligonucleotides have specifically interacted with subsequences in said target 
is nucleic acid, to determine the sequence of said target nucleic acid. 

2. The method as recited in claim 1 wherein said target nucleic acid is ribonucleic acid (RNA), optionally said nuclease 
is an RNA nuclease, preferably RNase A. 

20 3. A method for sequencing a target nucleic acid, said method comprising: 

(a) combining: 

(i) a substrate comprising an array of chemically synthesized and positionally distinguishable oligonucle- 
25 otides each of which is complementary to a defined subsequence of preselected length; and 

(ii) a target nucleic acid which is longer than each of said probes; thereby forming target-oligonucleotide 
hybrid complexes of complementary subsequences of known sequence with a 3' target overhang; 

(b) contacting said target-oligonucleotide hybrid complexes with a ligase and a labelled, ligatable oligonucleotide 
30 probe; 

(c) removing unbound target nucleic acid and labelled, unligated oligonucleotide probes; and 

(d) determining which of said oligonucleotides contain said labelled, ligatable oligonucleotide probe as an indi- 
cation of a subsequence which is complementary to a subsequence of said target nucleic acid. 

35 4. The method as recited in claim 1 or claim 3 wherein said target nucleic acid is deoxyribonucleic acid (DNA). 

5. The method as recited in claim 4 when dependent on claim 1 wherein said nuclease is a DNA nuclease, preferably 
DNA nuclease S1 nuclease or Mung Bean nuclease. 

40 6. The method as recited in any preceding claim wherein said array of oligonucleotides recognizes substantially all 
possible subsequences of preselected length found in said target nucleic acid. 

7. The method as recited in any preceding claim, wherein each oligonucleotide is of a length between about 6 and 20 
bases, preferably between about 8 and 1 5 bases. 

45 

8. The method as recited in any preceding daim, wherein said array of oligonucleotides comprises about 1 ,000 different 
oligonucleotides, preferably about 3,000 different oligonucleotides, preferably about 10 4 different oligonucleotides, 
more preferably about 10 5 different oligonucleotides, even more preferably about 10 6 different oligonucleotides. 

so 9. The method as recited in any one of claims 3, 4 or 6 to 8 wherein said ligase is a member selected from the group 
consisting of T4 DNA ligase, ligases isolated from E. coli and ligases isolated from bacteriophages. 

10. A method for sequencing an unlabeled target oligonucleotide, said method comprising: 

55 (a) combining: 

(i) a substrate comprising an array of positionally distinguishable oligonucleotide probes each of which has 
a constant region and a variable region, said variable region capable of binding to a defined subsequence 
of preselected length; 
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(ii) a constant oligonucleotide having a sequence which is complementary to said constant region of said 
oligonucleotide probes; 

(iii) a target oligonucleotide to be sequenced; and 

(iv) a ligase, thereby forming target-oligonucleotide hybrid complexes of complementary subsequences of 
5 known sequence; 

(b) contacting said target oligonucleotide-oligonucleotide probe hybrid complexes with a ligase and a pool of 
labelled, ligatable oligonucleotide probes of a preselected length, said pool of labelled, ligatable oligonucleotide 
probes representing all possible sequences of said preselected length; 

™ (c) removing unbound target nucleic acid and labelled, unligated oligonucleotide probes; and 

(d) determining which of said oligonucleotide probes contain said labelled, ligatable oligonucleotide probe as 
an indication of a subsequence which is complementary to a subsequence of said target oligonucleotide. 

11. A method for sequencing an unlabelled target oligonucleotide, said method comprising; 

75 

(a) contacting an unlabelled target oligonucleotide with a library of labelled oligonucleotide probes, each of said 
oligonucleotide probes having a known sequence and being attached to a solid support at a known position, to 
hybridize said target oligonucleotide to at least one member of said Iforary of probes, thereby forming a hybrid- 
ized library; 

20 (b) contacting said hybridized library with a nuclease capable of cleaving double-stranded oligonucleotides to 

release from said hybridized library a portion of said labelled oligonucleotide probes or fragments thereof; and 

(c) identifying said positions of said hybridized library from which labelled probes or fragments thereof have 
been removed, to determine the sequence of said unlabelled target oligonucleotide. 

25 12. A synthetic unimolecular. double-stranded oligonucleotide library comprising a plurality of different members, each 
member having the formula: 

Y-L 1 -X 1 -L 2 -X 2 

so wherein, 

Y is a solid support; 

X 1 and X 2 are a pair of complementary oligonucleotides 
L 1 is a spacer ; 

L 2 is a linking group having sufficient length such that X 1 and X 2 form a double-stranded oligonucleotide. 

3S 

13. A library in accordance with claim 12, wherein L 2 is a member selected from the group consisting of an alkylene 
group, a polyethyleneglycol group, a polyalcohol group, a polymine group and a polyester group. 

14. A library in accordance with claim 12 or claim 13, wherein X 1 and X 2 are complementary oligonucleotides each 
40 comprising of from 6 to 30 nucleic acid monomers. 

1 5. A library in accordance with any one of claims 1 2 to 1 4 f wherein said solid support is a silica support and L 1 comprises 
an aminoalkylsilane and from 1 to 4 hexaethyleneglycols. 

45 16. A synthetic unimolecular, double-stranded oligonucleotide library of any one of claims 12 to 15, wherein a portion 
of said double-stranded oligonucleotides formed by X 1 and X 2 further comprise a bulge or a loop. 

17. A synthetic unimolecular. double-stranded nucleic acid library of any one of claims 12 to 16, wherein each member 
further comprises an identifier tag, said identifier tag identifying the sequence of said unimolecular, double-stranded 

so nucleic acid. 

18. A synthetic unimolecular. double-stranded nucleic acid library of any one of claims 12 to 17. wherein said solid 
support comprises a first bead linked to a second bead, wherein the double-stranded nucleic acid is attached to the 
first bead and an identifier tag is attached to the second bead. 

55 

1 9. A method of forming a plurality of diverse unimolecular, double-stranded oligonucleotides on a solid support having 
optional spacers, said support comprising a surface with a plurality of preselected regions, said method comprising: 
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(a) forming on each of said preselected regions a different first oligonucleotide, each of said first oligonucleotides 
comprising of from 6 to 30 monomers; 

(b) attaching to the distal end of each of said first oligonucleotides of step 
s (a) a linking group; and 

(c) forming on the distal end of each of said linking groups a second oligonucleotide, wherein each of said 
second oligonucleotides is complementary to said f irst oligonucleotide which is attached within the same prese- 
lected region, and wherein said linking groups have sufficient length such that said first and second oligonucle- 

w otides form a unimolecular, double-stranded oligonucleotide. 

20. A method of screening a sample for a species capable of binding to double-stranded DNA comprising: 

contacting said sample with a solid support comprising unimolecular, double-stranded DNA attached thereon, 
each of said attached DNA independently having the formula; 

-X 11 -L-X 12 

wherein, 

X 1 1 and X 12 are complementary oligonucleotides; and 
20 L is a linking group having sufficient length such that X 1 1 and X 12 form said attached unimolecular, double- 

stranded DNA. to produce at least one bound pair comprising said species and one of said attached unimolecular, 
double-stranded DNA; and 

identifying said bound pair. 

25 21. A method in accordance with claim 20. wherein said species is a member selected from the group consisting of a 
drug, a protein and an RNA molecule. 

22. A method of screening a sample for a species capable of binding to double-stranded DNA comprising: 

contacting said sample with a solid support comprising a unimolecular, double-stranded DNA attached ther- 
30 eon, said attached DNA having the formula; 

-X 11 -L-X 12 

wherein, 

35 X 11 and X 12 are complementary oligonucleotides; and 

L is a linking group having sufficient length such that X 11 and X 12 form said attached unimolecular, double- 
stranded DNA, to produce a bound pair comprising said species and said attached unimolecular, double-stranded 
DNA; and 

identifying said bound pair. 

40 

23. A synthetic conformationally-restricted probe library comprising a plurality of members, each of said members com- 
prising a solid support attached to an oligomer having the formula: 

-X 11 -Z-X 12 

45 

wherein, 

X 11 and X 12 are complementary oligonucleotides; and 

Z is a probe having sufficient length such that X 1 1 and X 12 form a double-stranded portion of said member 
and thereby restrict the conformations available to said probe. 

50 

24. A synthetic library in accordance with claim 23, wherein each of said probes is a peptide having of from about 4 to 
about 12 amino acids and optionally each member further comprises an intercalating dye. 

25. A method of synthesizing a library of conformationally-restricted probes on a solid support having optional spacers, 
55 said support comprising a surface with a plurality of preselected regions, said method comprising: 

(a) forming on each of said preselected regions a first oligonucleotide, each of said first oligonucleotides com- 
prising of from 6 to 30 monomers; 

(b) attaching to the distal end of each of said first oligonucleotides of step 
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(a) a probe; and 

(c) forming on the distal end of each of said probes a second oligonucleotide, wherein each of said second 
oligonucleotides is complementary to said first oligonucleotide which is attached within the same preselected 
& region, and wherein said probes have sufficient length such that said first and second oligonucleotides form a 

unimolecular, double-stranded oligonucleotide thereby conformationally-restricting said probes. 

26. A method in accordance with claim 19 or claim 25, wherein said method of construction of step (a) and step (b) is 
by light-directed synthesis. 

10 

27. A method of screening a sample for a species capable of binding to a confbrmationally-restricted probe comprising: 

contacting said sample with a solid support comprising conformationally-restricted probes attached thereon, 
each of said attached probes independently having the formula; 

15 ' -X 11 -Z-X 12 

wherein, 

X 1 1 and X 12 are complementary oligonucleotides; 
and 2 is a probe having sufficient length such that X 1 ? and X 12 form a double-stranded oligonucleotide portion of 
20 said conformationally-restricted probe, to produce at least one bound pair comprising said species and one of said 
attached conformationally-restricted probes; and 

identifying said bound pair. 

28. An adhesive for use in biological applications comprising af irst surface having a plurality of attached oligonucleotides 
25 and a second surface having a plurality of attached oligonucleotides, wherein the oligonucleotides of said first surface 

are substantially complementary to the oligonucleotides of said second surface. 

29. A synthetic intermolecular, doubly-anchored, double-stranded oligonucleotide library comprising a plurality of dif- 
ferent members, each member having the formula: 

30 



35 




40 wherein, 

Y is a solid support; 

X 1 and X 2 are a pair of complementary oligonucleotides; 
L 1 and L 2 are each independently a bond or a spacer. 

45 30. A library in accordance with claim 29, wherein L 1 and L 2 each independently comprise a member selected from the 
group consisting of an alkylene group, a polyethyleneglycol group, a polyalcohol group, a polyamine group and a 
polyester group, preferably L 1 and L 2 each independently comprise a polyethylene glycol group. 

31. A library in accordance with claim 29 or claim 30, wherein X 1 and X 2 are complementary oligonucleotides each 
so comprising of from 6 to 30 nucleic acid monomers. 

32. A library in accordance with any one of claims 29 to 3 1 . wherein said solid support is a silica support and L 1 comprises 
an aminoalkylsilane and from 1 to 4 hexaethyleneglycols. 

55 33. A method of preparing a single-stranded nucleic acid sequence, said method comprising: 

(a) forming a hybrid complex by combining at least two oligonucleotides which are phosphorylated at their 5* 
ends with a chip-bound oligonucleotide, said chip-bound oligonucleotide having subsequences which are com- 
plementary to a subsequence of each of said oligonucleotide; 



56 



BNSDOCID <EP 



0721016A2 I > 



EP0 721 016 A2 



(b) contacting said hybrid complex with a ligase to form a ligated oligonucleotide; and 

(c) releasing said ligated oligonucleotide from said chip-bound oligonucleotide to form a single-stranded nucleic 
acid sequence. 
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FIGURE 15 
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